quickwit-oss / search-benchmark-game

Search engine benchmark (Tantivy, Lucene, PISA, ...)
https://tantivy-search.github.io/bench/
MIT License
80 stars 36 forks source link

Use same BM25 k1/b parameters across engines. #45

Open jpountz opened 1 year ago

jpountz commented 1 year ago

The k1 and b parameters of BM25 can influence what hits may be dynamically pruned and thus performance numbers, so it would be good to use the same values across engines. Currently it looks like engines use their own defaults, which seem to be k1=0.9 and b=0.4 for PISA, and k1=1.2 and b=0.75 for Lucene and Tantivy.

jpountz commented 1 year ago

To get a sense of the influence of these parameters on query performance, I compared Lucene-9.8 with 1.2/0.75 against 0.9/0.4 on the TOP_100 command. I'm getting:

So it's not huge but significant and extremely consistent: