michaelfeil / infinity

Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.
https://michaelfeil.eu/infinity/
MIT License
971 stars 72 forks source link

Scores slightly off/get rounded up to 1.0 #203

Closed ruben-vb closed 1 month ago

ruben-vb commented 2 months ago

System Info

Hey

I am working on a custom component in Haystack right now to build an Infinity Reranker. Might turn it into an Haystack Integration if I get the results I need.

Currently I'm running in an issue with the scores returned by Infinity:

Language: German Model: svalabs/cross-electra-ms-marco-german-uncased

The model performed very well for me using the TransformersRanker by Haystack (TransformersSimilarityRanker)

However, using the exact same Model in infinity, the scores are slightly off. The highest scoring documents had scores > 0.9995 with the TransformersRanker and all of them got (rounded up to?) exactly 1.0 in Infinity. This can cause the best matching document to not appear first in my list and therefore the prompt, reducing my output quality.

Also the scores of the 4th and 5th document were slightly above the TransformersRanker (both got 0.998 using TransformersRanker and 0.999 using Infinity).

You're calculating the sigmoid function manually using numpy in infinity if I've seen that correctly, the Reranker included in haystack uses numpy.sigmoid. In the pytorch documentation it says the following about the sigmoid function (here / here):

This function provides greater precision than exp(x) - 1 for small values of x.

Maybe that is whats causing the difference, but just a guess on my side.

If I find the time later today or this week I might check this out locally and see the difference torch.sigmoid would have. Just wanted to let you know already since I'm really busy for the rest of the week.

Information

Tasks

Reproduction

Quite a lot to reproduce 1:1 and some stuff I cant share, but something like this:

  1. Run instance of infinity with the default configurations, model as above (svalabs/cross-electra-ms-marco-german-uncased).

  2. Build a Haystack Pipeline with the mentioned TransformersSimilarityRanker or just run the ranker directly with some documents (quicker i guess)

  3. Compare the scores for identical documents, especially for cases where scores can be very high and/or close to each other

Expected behavior

More precise scores

michaelfeil commented 2 months ago

@ruben-vb I would expect a +- 1% error. Yes, there could be a rounding error in the way sigmoid is implemented with e**x. Does it lead to reranking of different results, in respect to the order of which comes first (reranking order)?

ruben-vb commented 2 months ago

It does at least for this case (or probably similar ones with very close to 1.0 scores) @michaelfeil

TransformersSimilarityRanker Top 5 Scores:

  1. 0.9997518658638
  2. 0.9995856881141663
  3. 0.999550998210907
  4. 0.9986598491668701
  5. 0.9986101388931274

All 15 documents (format document_index: score): {5: 0.9997518658638, 0: 0.9995856881141663, 2: 0.999550998210907, 4: 0.9986598491668701, 3: 0.9986101388931274, 6: 0.9980477094650269, 1: 0.9970782995223999, 10: 0.9935359954833984, 14: 0.9911372661590576, 9: 0.9897492527961731, 8: 0.9878960847854614, 7: 0.9821287393569946, 12: 0.9441548585891724, 11: 0.836544930934906, 13: 0.6658778786659241}

InfinityReranker Top 5 Scores:

  1. 1.0
  2. 1.0
  3. 1.0
  4. 0.9990234375
  5. 0.9990234375

All 15 documents (format document_index: score): {0: 1.0, 2: 1.0, 5: 1.0, 3: 0.9990234375, 4: 0.9990234375, 6: 0.998046875, 1: 0.9970703125, 10: 0.9931640625, 14: 0.9912109375, 9: 0.9892578125, 8: 0.98828125, 7: 0.98193359375, 12: 0.94384765625, 11: 0.83740234375, 13: 0.66796875}

ruben-vb commented 2 months ago

@michaelfeil Ok I've experimented a bit today: The issue does not come from the pytorch.sigmoid/np difference. Even though the results do not get rounded up to 1.0 using sigmoid, they are still slightly off.

It seems to be caused by using float16 instead of float32. I just now noticed the warning about setting INFINITY_DISABLE_HALF, but it did not work for me and I could not find any place in infinity where that actually impacts the dtype setting. It works as expected after commenting out the following: self.model.to(dtype=torch.float16)

in infinity_emb/transformer/crossencoder/torch.py, line 69

Would it be possible to add float32 to the engineArgs dtype or enable the env setting? Just out of interest, why the env variable instead of an additional dtype float32?

Also another bug I've noticed while looking for this: In the same file/class, CrossEncoderPatched, you set: automodel_args={"trust_remote_code": engine_args.trust_remote_code}

while the CrossEncoder class in sentenceTransformers does: self.model = AutoModelForSequenceClassification.from_pretrained( model_name, config=self.config, revision=revision, trust_remote_code=trust_remote_code, **automodel_args )

This causes the following: TypeError: transformers.models.auto.auto_factory._BaseAutoModelClass.from_pretrained() got multiple values for keyword argument 'trust_remote_code'

It works after changing the line from: automodel_args={"trust_remote_code": engine_args.trust_remote_code} to trust_remote_code=engine_args.trust_remote_code

michaelfeil commented 1 month ago

@ruben-vb This issue should be fixed with the latest container. Can you confirm?

ruben-vb commented 1 month ago

Yeah, with float32 the scores are identical, thanks :)

michaelfeil commented 1 month ago

Awesome (I would still recommend using fp16 then)