Use same BM25 k1/b parameters across engines.

quickwit-oss / search-benchmark-game

Search engine benchmark (Tantivy, Lucene, PISA, ...)

https://tantivy-search.github.io/bench/

MIT License

80 stars 36 forks source link

Use same BM25 k1/b parameters across engines. #45

Open jpountz opened 1 year ago

jpountz commented 1 year ago

The k1 and b parameters of BM25 can influence what hits may be dynamically pruned and thus performance numbers, so it would be good to use the same values across engines. Currently it looks like engines use their own defaults, which seem to be k1=0.9 and b=0.4 for PISA, and k1=1.2 and b=0.75 for Lucene and Tantivy.

jpountz commented 1 year ago

To get a sense of the influence of these parameters on query performance, I compared Lucene-9.8 with 1.2/0.75 against 0.9/0.4 on the TOP_100 command. I'm getting:

4.6% better latency on average for intersections with 0.9/0.4
4.2% better latency on average for unions with 0.9/0.4

So it's not huge but significant and extremely consistent:

7 queries get better latencies with 1.2/0.75
2 queries get the same latencies
893 queries get a better latency with 0.9/0.4