mikemccand / luceneutil

Various utility scripts for running Lucene performance tests
Apache License 2.0
205 stars 115 forks source link

Change the way how combined field queries are parsed. #318

Closed jpountz closed 1 week ago

jpountz commented 2 weeks ago

Today a query like CombinedHighMed: his interview +combinedFields=titleTokenized^8.0,body is parsed as a single CombinedFieldQuery query that computes scores by summing up the frequencies and document lengths across 4 terms: title:his^4, title:interview^4, body:his, body:interview.

This is not what I would like it to do. Instead, I would like it to compute the score of his by summing up frequencies and lengths across the title^8 and body fields, compute the score of interview likewise, and finally sum up these two scores via a BooleanQuery. Said otherwise both his and interview are each scored with BM25F, but then their BM25F scores are summed up.

jpountz commented 2 weeks ago

For reference, this doesn't affect CombinedTerm, which is parsed as the same query as before.