Closed Vanessa-Taing closed 1 month ago
Update on the log:
ValueError: Error occured when fetching info: {'error': 'The model akjindal53244/Llama-3.1-Storm-8B is too large to be loaded automatically (16GB > 10GB). Please use Spaces (https://huggingface.co/spaces) or Inference Endpoints (https://huggingface.co/inference-endpoints).'}
Got this message after second run, I assume there is a limit to the model size for inference?
Hi ! There is a limit but it's set by tgi and not lighteval. You could try using inference endpoints to serve the model
I see, thanks!
Describe the bug
Rate limit error when running the program on TGI endpoint. Wondering whether it is the API problem (error log in self-debugging attempt). And btw, can the inference point be set to spaces?
To Reproduce
Full log:
huggingface-cli whoami
, returned username, indicating successful log in status.Expected behavior
Evaluation running on inference API.
Version info
OS: Ubuntu (WSL) Python: 3.10 Cuda: 12.2 Commit:c83daef