huggingface / huggingface_hub

The official Python client for the Huggingface Hub.
https://huggingface.co/docs/huggingface_hub
Apache License 2.0
1.98k stars 514 forks source link

429 error in InferenceClient #2175

Closed sooryansatheesh closed 3 months ago

sooryansatheesh commented 5 months ago

System Info

@SunMarc

429 Client Error: Too Many Requests for url: https://api-inference.huggingface.co/models

Who can help?

I got the above error when I was trying to get tabular classification predictions from my own model

I used the code below `from huggingface_hub import InferenceClient input_data = [2, 3, 4, 2, 4] df = pd.DataFrame([input_data], columns=cols_used)

client = InferenceClient()

table = df.to_dict(orient="records")

print(table) client.tabular_classification(table=table, model=model_id)`

Can someone help me?

Information

Tasks

Reproduction

I got the above error when I was trying to get tabular classification predictions from my own model

I used the code below `from huggingface_hub import InferenceClient input_data = [2, 3, 4, 2, 4] df = pd.DataFrame([input_data], columns=cols_used)

client = InferenceClient()

table = df.to_dict(orient="records")

print(table) client.tabular_classification(table=table, model=model_id)`

Can someone help me?

Expected behavior

Prediction in the form of a single number from the model

ArthurZucker commented 5 months ago

cc @Wauplin

Wauplin commented 5 months ago

@sooryansatheesh In your reproducible script above

from huggingface_hub import InferenceClient
input_data = [2, 3, 4, 2, 4]
df = pd.DataFrame([input_data], columns=cols_used)

client = InferenceClient()

table = df.to_dict(orient="records")

print(table)
client.tabular_classification(table=table, model=model_id)

would you mind sharing what values you used for cols_used and model_id? Without it, it's hard to reproduce.

In general HTTP 429 means you got rate limited. Using an hf token should lift the rate limit up which might solve your situation. Another possibility is that your model doesn't load on our inference API servers but to investigate that we would need the model id.

anakin87 commented 3 months ago

@Wauplin thanks for your work. Unfortunately, this week I encountered a similar issue multiple times.

While authenticated, I tried to use google/gemma-1.1-2b-it (I have access) some days ago. The model was probably loading, so I interrupted the request after several minutes and then I got 429 and was blocked for an hour.

The same happened today with Qwen/Qwen2-7B-Instruct-AWQ.

Is it a known issue? Is there a way to raise an error in these cases and avoid hitting the rate limit?

Wauplin commented 3 months ago

Hi @anakin87, thanks for reporting. To improve user experience, I opened https://github.com/huggingface/huggingface_hub/pull/2318 which will add X-wait-for-model as header. This way InferenceClient won't send requests every second until model is loaded.