Model is downloaded each time I run the container

System Info

text-embedding-inference:1.3.0

Information

[X] Docker
[ ] The CLI directly

Tasks

[ ] An officially supported command
[ ] My own modifications

Reproduction

Use the Docker script provided
Run the container so it download the model
Re-run the container to see the model be redownloaded

Expected behavior

and I want that the model is downloaded once and then just use the already downloaded model. However, I'm getting my model downloaded each time I run the container:

Here is my docker-compose configuration: ` reranker: image: ghcr.io/huggingface/text-embeddings-inference:86-1.3.0 restart: always ports:

"8082:8082" environment:
http_proxy=${http_proxy-}
https_proxy=${https_proxy-}
MODEL_ID=${RERANKER_MODEL:-}
PORT=8082
MAX_BATCH_TOKENS=${RERANKER_MAX_TOKEN:-16384}
MAX_CLIENT_BATCH_SIZE=${RERANKER_MAX_TOKEN:-32}
USE_FLASH_ATTENTION=True volumes:
model_cache_huggingface:/data deploy: resources: reservations: devices:
- driver: nvidia count: all capabilities: ["gpu"] extra_hosts:
"host.docker.internal:host-gateway"

volumes: model_cache_huggingface: `

Is it normal because even when using MODEL_ID=/data/${RERANKER_MODEL:-}, I got an error downloading the model.

huggingface / text-embeddings-inference

Model is downloaded each time I run the container #314

System Info

Information

Tasks

Reproduction

Expected behavior