mikemccand / luceneutil

Various utility scripts for running Lucene performance tests
Apache License 2.0
202 stars 113 forks source link

Minimal proof for Bulk DocIdSetIterator for Lucene PR 13149 #257

Open antonha opened 6 months ago

antonha commented 6 months ago

This PR is the smallest I could make (except for number of LongNrq queries, could probably be fewer) to prove that the changes in apache/lucene/pull/13149 work.

I aimed at reproducing for wikimediumall. This needs to be run with optimize = True for indexing and commitPoint = 'single' for the competition - otherwise it is hard to see the performance difference. The reason for this is that the BkdTree IntsWriter otherwise chooses too good of a compression, since the number of documents is too low in each segment.

I'm not sure if this should be merged - the PR is mostly here for reference.