Open milank94 opened 2 days ago
I can add something like HF login (https://huggingface.co/docs/huggingface_hub/en/quick-start#login-command):
from huggingface_hub import login
login()
Previously we didn't require a HF account to run the flask inference API server, but it's getting increasingly common to assume users will have this when using vLLM.
When following initial setup steps from: https://github.com/tenstorrent/tt-inference-server/tree/main/vllm-tt-metal-llama3-70b#vllm-tt-metalium-llama-31-70b-inference-api, fails due to missing HF token permissions to download config for
Meta-Llama-3.1-70B
.