qdrant / fastembed

Fast, Accurate, Lightweight Python library to make State of the Art Embedding
https://qdrant.github.io/fastembed/
Apache License 2.0
1.04k stars 79 forks source link

[Bug/Model Request]: Question on BM42 performance on Quora dataset #282

Open VoVAllen opened 3 days ago

VoVAllen commented 3 days ago

What happened?

https://github.com/castorini/anserini/blob/5eb46b9f9bd563c34deca85a5c7417c068348972/docs/regressions/regressions-beir-v1.0.0-quora.flat.md

According to anserini's experiment, using BM25 can get NDCG@10 at 78.8%, which is much better than the reported number Precision@10 at 45% in https://qdrant.tech/articles/bm42/. Why the BM25 performance in Qdrant in much worse than anserini using Elasticsearch?

What Python version are you on? e.g. python --version

NA

Version

0.2.7 (Latest)

What os are you seeing the problem on?

Linux

Relevant stack traces and/or logs

No response

joein commented 3 days ago

https://discord.com/channels/907569970500743200/1257601658523877449