runpod-workers / worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
MIT License
213 stars 82 forks source link

Huggingface is down and my worker is looping #46

Closed dannysemi closed 6 months ago

dannysemi commented 6 months ago

I already have the model stored in my network volume, but I guess the worker checks Huggingface anyway? It's looping on a gateway error.

"message":"huggingface_hub.utils._errors.HfHubHTTPError: 504 Server Error: Gateway Time-out for url: https://huggingface.co/api/models/TheBloke/Nous-Capybara-34B-GPTQ"

Edit: Using worker-vllm:0.2.3

alpayariyak commented 6 months ago

Set environment variables TRANSFORMERS_OFFLINE and HF_HUB_OFFLINE to 1 in the endpoint template