michaelfeil / infinity

Infinity is a high-throughput, low-latency REST API for serving text-embeddings, reranking models and clip
https://michaelfeil.github.io/infinity/
MIT License
1.31k stars 96 forks source link

Huggingface CLI token for gated models? #268

Closed Foddy closed 3 months ago

Foddy commented 3 months ago

Is there a way to pass an access token from Huggingface in the Docker container? I would like to try the model nvidia/NV-Embed-v1, which unfortunately is restricted.

Cannot access gated repo for url https://huggingface.co/nvidia/NV-Embed-v1/resolve/main/config.json.
Access to model nvidia/NV-Embed-v1 is restricted. You must be authenticated to access it.
greptile-apps[bot] commented 3 months ago

To pass a Huggingface access token in the Docker container, you can set the HF_TOKEN environment variable. Modify the Docker run command to include the token as follows:

port=7997
model1=nvidia/NV-Embed-v1
volume=$PWD/data

# Set your Huggingface token
HF_TOKEN=your_huggingface_token_here

docker run -it --gpus all \
 -v $volume:/app/.cache \
 -p $port:$port \
 -e HF_TOKEN=$HF_TOKEN \
 michaelf34/infinity:latest \
 v2 \
 --model-id $model1 \
 --port $port

Ensure that the HF_TOKEN environment variable is set correctly in your Dockerfile or Docker run command.

References

/docs/docs/deploy.md /libs/infinity_emb/Dockerfile

Ask Greptile