504 Server Error: Gateway Time-out

dgks0n commented 1 month ago

Describe the bug

huggingface_hub.utils._errors.HfHubHTTPError: 504 Server Error: Gateway Time-out for url: https://huggingface.co/api/models?filter=text-classification&filter=pytorch&filter=transformers&sort=downloads&direction=-1&config=True&cursor=eyIkb3IiOlt7ImRvd25sb2FkcyI6MiwiX2lkIjp7IiRndCI6IjYzOTBkOWYzM2RlNzZkNjAyOTlhY2JlMCJ9fSx7ImRvd25sb2FkcyI6eyIkbHQiOjJ9fSx7ImRvd25sb2FkcyI6bnVsbH1dfQ%3D%3D

How to solve this?

Reproduction

I installed the latest version of huggingface_hub but I faced 504 Gateway Timeout error.

_huggingface_api = HfApi()

def get_models() -> dict:
    pipeline_tag = 'text-classification'
    applied_tags = {
        'pipelineTags': [pipeline_tag],
        'libraries': ['pytorch', 'transformers'],
    }
    # TODO: Fix bug 504 Gateway Timeout
    models = _huggingface_api.list_models(
        filter=[tags for tags_list in applied_tags.values() for tags in tags_list],
        sort='downloads',
        direction=-1,
        fetch_config=True
    )
    models = [_convert_hf_model_info(model) for model in models if model.pipeline_tag == pipeline_tag]
    ...

Logs

No response

System info

- huggingface_hub version: 0.25.0.dev0
- Platform: Linux-6.8.0-40-generic-x86_64-with-glibc2.35
- Python version: 3.9.19
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Token path ?: /home/dummy/.cache/huggingface/token
- Has saved token ?: False
- Configured git credential helpers: 
- FastAI: N/A
- Tensorflow: N/A
- Torch: 1.13.1
- Jinja2: 3.1.2
- Graphviz: N/A
- keras: N/A
- Pydot: N/A
- Pillow: 10.3.0
- hf_transfer: N/A
- gradio: N/A
- tensorboard: 2.6.2.2
- numpy: 1.22.0
- pydantic: 2.8.2
- aiohttp: 3.10.3
- ENDPOINT: https://huggingface.co
- HF_HUB_CACHE: /home/dummy/.cache/huggingface/hub
- HF_ASSETS_CACHE: /home/dummy/.cache/huggingface/assets
- HF_TOKEN_PATH: /home/dummy/.cache/huggingface/token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10

{'huggingface_hub version': '0.25.0.dev0', 'Platform': 'Linux-6.8.0-40-generic-x86_64-with-glibc2.35', 'Python version': '3.9.19', 'Running in iPython ?': 'No', 'Running in notebook ?': 'No', 'Running in Google Colab ?': 'No', 'Token path ?': '/home/dummy/.cache/huggingface/token', 'Has saved token ?': False, 'Configured git credential helpers': '', 'FastAI': 'N/A', 'Tensorflow': 'N/A', 'Torch': '1.13.1', 'Jinja2': '3.1.2', 'Graphviz': 'N/A', 'keras': 'N/A', 'Pydot': 'N/A', 'Pillow': '10.3.0', 'hf_transfer': 'N/A', 'gradio': 'N/A', 'tensorboard': '2.6.2.2', 'numpy': '1.22.0', 'pydantic': '2.8.2', 'aiohttp': '3.10.3', 'ENDPOINT': 'https://huggingface.co', 'HF_HUB_CACHE': '/home/dummy/.cache/huggingface/hub', 'HF_ASSETS_CACHE': '/home/dummy/.cache/huggingface/assets', 'HF_TOKEN_PATH': '/home/dummy/.cache/huggingface/token', 'HF_HUB_OFFLINE': False, 'HF_HUB_DISABLE_TELEMETRY': False, 'HF_HUB_DISABLE_PROGRESS_BARS': None, 'HF_HUB_DISABLE_SYMLINKS_WARNING': False, 'HF_HUB_DISABLE_EXPERIMENTAL_WARNING': False, 'HF_HUB_DISABLE_IMPLICIT_TOKEN': False, 'HF_HUB_ENABLE_HF_TRANSFER': False, 'HF_HUB_ETAG_TIMEOUT': 10, 'HF_HUB_DOWNLOAD_TIMEOUT': 10}

Wauplin commented 1 month ago

Hi @dgks0n, thanks for reporting and sorry for the inconvenience. I have been able to reproduce the issue with this code snippet:

from huggingface_hub import list_models

count = 0
for model in list_models(
        filter=["text-classification", "pytorch", "transformers"],
        sort='downloads',
    ):
    count += 1
    print(count, model.id, model.downloads)

Fails after 36000 has been printed. Since there are ~36234 models matching the filter on the Hub, I suspect it has to do with the last page. If I limit to 10 or 1234, it works fine. I'll report it to the Hub team and let you know.

EDIT: if fails only if sort="downloads" is passed to the query. EDIT 2: if fails no matter if fetch_config=True is passed or not.

dgks0n commented 1 month ago

@Wauplin Thanks for your response~

Another one is, it seems almost of models don't have siblings attribute such like old version. How can I overcome it?

Wauplin commented 1 month ago

@dgks0n listing siblings for all 36000 models is a quite heavy action for the server. What is your use case?

I don't know if something has changed recently on this. @Pierrci would you be able to share some details?

Pierrci commented 1 month ago

To return the siblings, you need to add &expand[]=siblings to the query (or pass full=1, but that's deprecated), they're not returned by default (I think it's been the case for some time).

As a matter of fact, it's recommended to use expand[]=config&expand[]=siblings&expand[]=... to specify the properties you're interested in; it will be more efficient both from a DB and network perspective - not sure how the internals of hf_hub work in this regard :)

Wauplin commented 1 month ago

To do what @Pierrci suggested above with the Python client, you can pass like this:

from huggingface_hub import list_models

for model in list_models(
    filter=...,
    sort=...,
    ...
    expand=["config", "siblings"],
):
    ...

Only the selected fields will be populated in the returned ModelInfo objects.

dangokuson commented 1 month ago

@Wauplin Is it possible to use list_models method like this sort=['downloads', 'last_updated']?

Wauplin commented 1 month ago

Is it possible to use list_models method like this sort=['downloads', 'last_updated']?

@dangokuson no, that's not possible as far as I know.

dangokuson commented 1 month ago

Is it possible to use list_models method like this sort=['downloads', 'last_updated']?

@dangokuson no, that's not possible as far as I know.

@Wauplin You mean it only supports sorting by single field such as sort='downloads'?

Wauplin commented 1 month ago

Yes exactly.

from huggingface_hub import list_models

print("Top 5 models by downloads:")
for model in list_models(sort="downloads", limit=5):
    print(model.id, model.lastModified, model.downloads)

print("\nTop 5 models by last modified:")
for model in list_models(sort="last_modified", limit=5):
    print(model.id, model.lastModified, model.downloads)

=>

Top 5 models by downloads:
MIT/ast-finetuned-audioset-10-10-0.4593 None 205803621
microsoft/resnet-50 None 71027344
google-bert/bert-base-uncased None 54775603
amazon/chronos-t5-tiny None 53005157
facebook/fasttext-language-identification None 52132618

Top 5 models by last modified:
igorktech/hat-tiny-cased-conversational-p2_1-grouped-128-v2 2024-08-20 10:17:39+00:00 0
GaetanMichelet/Llama-31-8B_task-1_60-samples_config-2_full 2024-08-20 10:17:32+00:00 0
KoichiYasuoka/deberta-xlarge-chinese-erlangshen-ud-goeswith 2024-08-20 10:17:19+00:00 6
devngho/ko-edu-classifier_3 2024-08-20 10:17:18+00:00 0
srikarvar/multilingual-e5-small-pairclass-4 2024-08-20 10:17:17+00:00 0

dgks0n commented 1 month ago

@Wauplin I used as below but it returned empty list.

    models = _huggingface_api.list_models(
        filter=['text-classification','pytorch', 'transformers'],
        sort='downloads',
        expand=["config", "siblings"],
        limit=500
    )

Is there something wrong?

Wauplin commented 1 month ago

@dgks0n are you sure? I just ran

from huggingface_hub import list_models

for model in list_models(
    filter=['text-classification','pytorch', 'transformers'],
    sort='downloads',
    expand=["config", "siblings"],
    limit=5,
):
    print(model.id, model.downloads)

and got

tasksource/deberta-small-long-nli 12071285
cardiffnlp/twitter-roberta-base-sentiment-latest 10080410
avichr/heBERT_sentiment_analysis 7866020
distilbert/distilbert-base-uncased-finetuned-sst-2-english 7018737
finiteautomata/bertweet-base-sentiment-analysis 4146871

using huggingface_hub==0.24.6

dgks0n commented 1 month ago

@Wauplin Thanks. It works~

huggingface / huggingface_hub