huggingface / huggingface_hub

The official Python client for the Huggingface Hub.
https://huggingface.co/docs/huggingface_hub
Apache License 2.0
1.82k stars 470 forks source link

[HfFileSystem] Faster `fs.walk()` #2346

Closed lhoestq closed 1 week ago

lhoestq commented 2 weeks ago

...by using expand_info=False by default (same logic as fs.glob())

before:

In [1]: from huggingface_hub import HfFileSystem

In [2]: %time _ = list(HfFileSystem().walk("hf://datasets/allenai/c4/en"))
CPU times: user 275 ms, sys: 27.6 ms, total: 302 ms
Wall time: 11.6 s

after:

In [1]: from huggingface_hub import HfFileSystem

In [2]: %time _ = list(HfFileSystem().walk("hf://datasets/allenai/c4/en"))
CPU times: user 176 ms, sys: 22.4 ms, total: 198 ms
Wall time: 3.25 s
HuggingFaceDocBuilderDev commented 2 weeks ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.