Open shubhamvishu opened 7 months ago
Could you use tasks where dynamic pruning doesn't apply instead of disabling it? E.g. use counting tasks?
Could you use tasks where dynamic pruning doesn't apply instead of disabling it? E.g. use counting tasks?
+1, that's a nice approach. Though even Lucene's count()
API has some nice optimizations to bypass visiting all postings / sub-linear implementations I think?
Indeed IndexSearcher#count
has some optimizations to bypass postings. But it was mostly an example, some cheap faceting should work too?
Could you use tasks where dynamic pruning doesn't apply instead of disabling it? E.g. use counting tasks?
Do you mean to wrap the clauses with "count( )" like eg https://github.com/mikemccand/luceneutil/blob/master/tasks/countOnly.tasks so that we check the performance but avoid BMW? I like this idea if I understand correctly. But not sure if we could make it an option with benchmarks straightforwardly.
Indeed IndexSearcher#count has some optimizations to bypass postings. But it was mostly an example, some cheap faceting should work too?
I'm not sure what you mean by using some cheap faceting here. Maybe you could elaborate on this idea? Also, since we want to enable it via benchmarks, does this also fit well in that picture?
Indeed IndexSearcher#count has some optimizations to bypass postings. But it was mostly an example, some cheap faceting should work too?
I'm not sure what you mean by using some cheap faceting here. Maybe you could elaborate on this idea? Also, since we want to enable it via benchmarks, does this also fit well in that picture?
I think @jpountz is referring to enabling faceting on each task. luceneutil
's TaskParser
supports this with e.g. +facets:Date.sortedset
. Because facets require counting all hits, it forces Lucene to disable BMW. The problem is, it also adds some cost (I think that's why @jpountz suggested finding a "cheap" one heh), which is not great because it dilutes what you are trying to measure (a change in postings decode / visit time).
Could you use tasks where dynamic pruning doesn't apply instead of disabling it? E.g. use counting tasks?
Do you mean to wrap the clauses with "count( )" like eg https://github.com/mikemccand/luceneutil/blob/master/tasks/countOnly.tasks so that we check the performance but avoid BMW? I like this idea if I understand correctly. But not sure if we could make it an option with benchmarks straightforwardly.
luceneutil
supports count tasks with syntax like count(+a +b)
. This is parsed to use IndexSearcher
's count
API. I think that may be a quick workaround for benchmarking https://github.com/mikemccand/luceneutil/pull/258
Thanks for the explanation, Mike! I'll try benchmarking it change using count tasks and share the results. Btw, if the above-mentioned approach of maxing out IndexSearcher.TOTAL_HITS_THRESHOLD
also makes sense, then in that case I had already shared the results for it over here.
Description
Currently, there is no straight-forward way to disable in lucene benchmarks(?) which could be required in testing some optimizations like #258. I'd great if we could add an option/argument to disable BMW while benchmarking.
One idea could be to Increase
TOTAL_HITS_THRESHOLD
in IndexSearcher.java toInteger.MAX_VALUE
. Maybe we could add a setter for the same?Looking for more ideas on this!