Add an option to disable BMW optimization for benchmarks

shubhamvishu commented 7 months ago

Description

Currently, there is no straight-forward way to disable in lucene benchmarks(?) which could be required in testing some optimizations like #258. I'd great if we could add an option/argument to disable BMW while benchmarking.
One idea could be to Increase TOTAL_HITS_THRESHOLD in IndexSearcher.java to Integer.MAX_VALUE. Maybe we could add a setter for the same?

Looking for more ideas on this!

jpountz commented 7 months ago

Could you use tasks where dynamic pruning doesn't apply instead of disabling it? E.g. use counting tasks?

mikemccand commented 7 months ago

Could you use tasks where dynamic pruning doesn't apply instead of disabling it? E.g. use counting tasks?

+1, that's a nice approach. Though even Lucene's count() API has some nice optimizations to bypass visiting all postings / sub-linear implementations I think?

jpountz commented 7 months ago

Indeed IndexSearcher#count has some optimizations to bypass postings. But it was mostly an example, some cheap faceting should work too?

shubhamvishu commented 7 months ago

Could you use tasks where dynamic pruning doesn't apply instead of disabling it? E.g. use counting tasks?

Do you mean to wrap the clauses with "count( )" like eg https://github.com/mikemccand/luceneutil/blob/master/tasks/countOnly.tasks so that we check the performance but avoid BMW? I like this idea if I understand correctly. But not sure if we could make it an option with benchmarks straightforwardly.

Indeed IndexSearcher#count has some optimizations to bypass postings. But it was mostly an example, some cheap faceting should work too?

I'm not sure what you mean by using some cheap faceting here. Maybe you could elaborate on this idea? Also, since we want to enable it via benchmarks, does this also fit well in that picture?

mikemccand commented 7 months ago

Indeed IndexSearcher#count has some optimizations to bypass postings. But it was mostly an example, some cheap faceting should work too?

I'm not sure what you mean by using some cheap faceting here. Maybe you could elaborate on this idea? Also, since we want to enable it via benchmarks, does this also fit well in that picture?

I think @jpountz is referring to enabling faceting on each task. luceneutil's TaskParser supports this with e.g. +facets:Date.sortedset. Because facets require counting all hits, it forces Lucene to disable BMW. The problem is, it also adds some cost (I think that's why @jpountz suggested finding a "cheap" one heh), which is not great because it dilutes what you are trying to measure (a change in postings decode / visit time).

Could you use tasks where dynamic pruning doesn't apply instead of disabling it? E.g. use counting tasks?

Do you mean to wrap the clauses with "count( )" like eg https://github.com/mikemccand/luceneutil/blob/master/tasks/countOnly.tasks so that we check the performance but avoid BMW? I like this idea if I understand correctly. But not sure if we could make it an option with benchmarks straightforwardly.

luceneutil supports count tasks with syntax like count(+a +b). This is parsed to use IndexSearcher's count API. I think that may be a quick workaround for benchmarking https://github.com/mikemccand/luceneutil/pull/258

shubhamvishu commented 7 months ago

Thanks for the explanation, Mike! I'll try benchmarking it change using count tasks and share the results. Btw, if the above-mentioned approach of maxing out IndexSearcher.TOTAL_HITS_THRESHOLD also makes sense, then in that case I had already shared the results for it over here.

mikemccand / luceneutil

Add an option to disable BMW optimization for benchmarks #265

Description