huggingface / text-embeddings-inference

A blazing fast inference solution for text embeddings models
https://huggingface.co/docs/text-embeddings-inference/quick_tour
Apache License 2.0
2.88k stars 184 forks source link

Nondeterministic Reranker Scores with baai/bge-reranker-large #142

Closed tanuki0492 closed 9 months ago

tanuki0492 commented 10 months ago

System Info

docker run -d --gpus all --restart always --name reranker -p 443:5051 -v /models/baai_bge-reranker-large:/model ghcr.io/huggingface/text-embeddings-inference:86-0.6 --model-id /model --port 5051 --hostname XX.XX.XXX.XXX --revision refs/pr/4

Information

Tasks

Reproduction

import requests

question = "What are the key factors that investors should consider when evaluating a company's financial health?"
answers = ["Investors should analyze various financial ratios and metrics such as liquidity ratios, profitability ratios (return on equity, net profit margin), and leverage ratios (debt-to-equity ratio). These metrics provide insights into the company's ability to meet short-term obligations, generate profits, and manage debt effectively.",
           "Evaluating a company's cash flow is crucial. Positive operating cash flow ensures the company can cover its day-to-day expenses, invest in growth opportunities, and pay dividends. Investors should assess the consistency and sustainability of cash flow to gauge the company's financial stability.",
           "Understanding the company's debt levels and capital structure is vital. High levels of debt may pose risks, but a well-managed debt structure can enhance returns. Investors should assess the proportion of debt in the capital structure and the company's ability to service its debt obligations.",
           "Examining broader market and industry trends is essential. A company's financial health is often influenced by external factors such as economic conditions, industry competition, and technological advancements. Investors should consider how well the company is positioned in its market and its ability to adapt to changing industry dynamics."]

response = requests.post("http://127.0.0.1/rerank",
                         json={"query": question,
                                "texts": answers,
                                "raw_scores": False, "truncate": True},
                            verify=False)

print(response.json())

Expected behavior

Expected behavior

Every run of above code should always return the same scores, i.e. should be deterministic.

Actual behavior

1st run
[{'index': 3, 'score': 0.9990601},
 {'index': 0, 'score': 0.97085404},
 {'index': 1, 'score': 0.9425068},
 {'index': 2, 'score': 0.93257624}]
2nd run
[{'index': 3, 'score': 0.9990638},
 {'index': 0, 'score': 0.97085404},
 {'index': 1, 'score': 0.9425068},
 {'index': 2, 'score': 0.9323302}]
3rd run
[{'index': 3, 'score': 0.9990638},
 {'index': 0, 'score': 0.97085404},
 {'index': 1, 'score': 0.94240075},
 {'index': 2, 'score': 0.93257624}]
OlivierDehaene commented 9 months ago

A lot of GPU operations have non deterministic behaviours. Depending on how large the batch is, you won't have exactly the same values. If determinism is something that you truly care about, you should run TEI with --max-batch-requests=1.