Closed nbravulapalli closed 2 years ago
Hi!
Do have an update on this issue? Thank you for all the support.
@LysandreJik @patil-suraj
cc @Narsil
Hi @nbravulapalli ,
There indeed seemed to have been an issue with that model. It should be back up again.
We are actively tracking those issues to reduce them to a minimum, but sometimes there is indeed a memory error depending on what other models are being used at the same time.
Sorry about the issue you were seeing.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hi, I'm facing this same error on facebook/bart-large-mnli when trying to use GPU-accelerated inference.
I am using this model for text classification and passing 10 candidate labels. When using GPU-Accelerated Inference I am getting error 400 Bad request { "error": "CUDA error: out of memory\nCUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1." }
Could anyone point me to why this is the case? Thanks.
@Narsil
I'm also facing this issue for facebook/bart-large-mnli
on the Lab plan. Is there any advice on workarounds here?
Also, just as user feedback, I would expect this error to result in a 500 or 503 error, instead of 400.
I still have the same issue on facebook/bart-large-cnn with GPU inference? Any solutions?
For GPU inference, you should check out our premium plans:
Spaces https://huggingface.co/docs/hub/spaces-overview
Or Inference Endpoints https://huggingface.co/inference-endpoints
The API is public and free, so GPU access is limited.
Who can help
@LysandreJik @patil-suraj
To reproduce
Expected behavior
For more than a month, I have used the above code snippet to retrieve text completions from gpt2 using huggingface's inference API. However, when the same snippet ran again today, the inference API gave the following response:
Error Message 1:
This persisted for roughly half an hour, and after that time, the API would only allow me to make API requests with very few tokens of text in the
INPUT_TEXT
variable. All normal-sized requests gave the following error:Error Message 2:
Keep in mind that when I get the above error, it is with the same arguments (the value for INPUT_TEXT, NUM_SEQUENCES, and MAX_LENGTH) that I have been using this API with for more than a month. I have checked my account, and the inference API dashboard shows that I am still within the free quota provided by huggingface (my subscription plan is the "Lab: pay as you go" option). Can you please help me resolve this?
Sample argument that causes an error: INPUT_TEXT = 'Hippocrates, another ancient Greek, established a medical school, wrote many medical treatises, and is— because of Hippocrates, another ancient Greek,' NUM_SEQUENCES = 7 MAX_LENGTH = 105
Edit
It appears that the API response message is varying between Error Messages 1 and 2 (originally it was 1, then 2, and now 1 again).