huggingface / huggingface_hub

The official Python client for the Huggingface Hub.
https://huggingface.co/docs/huggingface_hub
Apache License 2.0
1.83k stars 471 forks source link

[HfFileSystem] Less /paths-info calls #2271

Closed lhoestq closed 1 month ago

lhoestq commented 2 months ago

...or we get rate-limited in the dataset viewer (e.g. for FineWeb)

cc @severo

HuggingFaceDocBuilderDev commented 2 months ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

lhoestq commented 1 month ago

This is also causing bugs in datasets when loading datasets with many files, e.g. load_dataset('mteb/biblenlp-corpus-mmteb'):

huggingface_hub.utils._errors.HfHubHTTPError: 429 Client Error: Too Many Requests for url: https://huggingface.co/api/datasets/mteb/biblenlp-corpus-mmteb/paths-info/3912ed967b0834547f35b2da9470c4976b357c9a

could you take a look @Wauplin ? It would be cool to release this fix when you return :p

lhoestq commented 1 month ago

It's been in prod in datasets-viewer and it fixes the HfHubHTTPError (Too Many Requests) both for the FineWeb's viewer and also for loading the mmteb datasets in datasets

Wauplin commented 1 month ago

Great, thanks for confirming :+1:

Wauplin commented 1 month ago

@lhoestq hot-fix released in 0.23.1: https://github.com/huggingface/huggingface_hub/releases/tag/v0.23.1

lhoestq commented 1 month ago

thank you !!