Closed tanuki0492 closed 9 months ago
A lot of GPU operations have non deterministic behaviours. Depending on how large the batch is, you won't have exactly the same values.
If determinism is something that you truly care about, you should run TEI with --max-batch-requests=1
.
System Info
docker run -d --gpus all --restart always --name reranker -p 443:5051 -v /models/baai_bge-reranker-large:/model ghcr.io/huggingface/text-embeddings-inference:86-0.6 --model-id /model --port 5051 --hostname XX.XX.XXX.XXX --revision refs/pr/4
Information
Tasks
Reproduction
Expected behavior
Expected behavior
Every run of above code should always return the same scores, i.e. should be deterministic.
Actual behavior
1st run
2nd run
3rd run