mikemccand / stargazers-migration-test

Testing Lucene's Jira -> GitHub issues migration
0 stars 0 forks source link

BooleanQuery with no scoring clauses cannot skip documents when running TOP_SCORES mode [LUCENE-8935] #932

Closed mikemccand closed 5 years ago

mikemccand commented 5 years ago

Today a boolean query that is composed of filtering clauses only (more than one) cannot skip documents when the search is executed with the TOP_SCORES mode. However since all documents have a score of 0 it should be possible to early terminate the query as soon as we collected enough top hits. Wrapping the resulting boolean scorer in a constant score scorer should allow early termination in this case and would speed up the retrieval of top hits case considerably if the total hit count is not requested.


Legacy Jira details

LUCENE-8935 by Jim Ferenczi (@jimczi) on Jul 26 2019, resolved Jul 29 2019 Attachments: LUCENE-8935.patch

mikemccand commented 5 years ago

Here is a patch that wraps the boolean scorer in a constant score scorer when there is no scoring clause and the score mode is TOP_SCORES.

[Legacy Jira: Jim Ferenczi (@jimczi) on Jul 26 2019]

mikemccand commented 5 years ago

The approach works for me. I'm wondering that if we put this logic at the very bottom of Boolean2ScorerSupplier#get instead then we'd also cover the case when there is a SHOULD clause in addition to the FILTER clauses, but it produces a null scorer.

[Legacy Jira: Adrien Grand (@jpountz) on Jul 26 2019]

mikemccand commented 5 years ago

The logic is already at the bottom of Boolean2ScorerSupplier#get but good call on the SHOULD clause that can produce a null scorer.

We can check the number of scoring clauses after the build instead of checking the number of scorer suppliers. I'll work on a fix.

[Legacy Jira: Jim Ferenczi (@jimczi) on Jul 26 2019]

mikemccand commented 5 years ago

Sorry I misunderstood the logic but the number of scoring clauses is already computed from the pruned list of scorers so the actual patch works. It's the scorer supplier that can be null but in such case they would not appear in Boolean2ScorerSupplier. 

[Legacy Jira: Jim Ferenczi (@jimczi) on Jul 26 2019]

mikemccand commented 5 years ago

Woops indeed you are right. +1 to the attached patch!

[Legacy Jira: Adrien Grand (@jpountz) on Jul 26 2019]

mikemccand commented 5 years ago

Commit b8289abeebb23b10ea02b8a27d6b6c07deaa9e50 in lucene-solr's branch refs/heads/master from jimczi https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b8289ab

LUCENE-8935: BooleanQuery with no scoring clause can now early terminate the query when the total hits is not requested.

[Legacy Jira: ASF subversion and git services on Jul 29 2019]

mikemccand commented 5 years ago

Commit c557e4323daaff43d041d0599b254d94f1b8d792 in lucene-solr's branch refs/heads/branch_8x from jimczi https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c557e43

LUCENE-8935: BooleanQuery with no scoring clause can now early terminate the query when the total hits is not requested.

[Legacy Jira: ASF subversion and git services on Jul 29 2019]

mikemccand commented 2 years ago

Closing after the 9.0.0 release

[Legacy Jira: Adrien Grand (@jpountz) on Dec 08 2021]