Automatic over-provisioning for heuristic faceting

The first test of heuristic faceting is Dubious guesses, counted correctly. For the corpus used in the test, top-10 could be reliably determined using heuristic faceting, by requesting top-25 and only using the top-10 of the returned terms. This should be done under the hood, so asking for top-10 gives the correct top-10 (with high probability).

Looking at vanilla Solr distributed faceting, having a (configurable) factor and a constant with which to determine over-provisioning gives a great deal of flexibility.

tokee / lucene-solr

Automatic over-provisioning for heuristic faceting #42