Currently sampling is handled by calculating a single chunk size over the full document count (maxDoc) and only update counts for the documents that are set in the result bitmap. This means that the number of documents can vary a great deal due to chance.
An alternative would be an adaptive strategy: Instead of chunk size being based on maxDoc, it can be calculated with hitCount/indexMaxDoc * segmentMaxDoc * samplingFactor / numChunks. The first chunk is finished when the number of matching documents is equal to chunk size. The skip to the next chunk is then calculated based on the current position in the result bitmap.
Currently sampling is handled by calculating a single chunk size over the full document count (maxDoc) and only update counts for the documents that are set in the result bitmap. This means that the number of documents can vary a great deal due to chance.
An alternative would be an adaptive strategy: Instead of chunk size being based on maxDoc, it can be calculated with
hitCount/indexMaxDoc * segmentMaxDoc * samplingFactor / numChunks
. The first chunk is finished when the number of matching documents is equal to chunk size. The skip to the next chunk is then calculated based on the current position in the result bitmap.