tokee / lucene-solr

High cardinality faceting (SOLR-5894)
http://tokee.github.io/lucene-solr/
7 stars 1 forks source link

Automatic over-provisioning for heuristic faceting #42

Open tokee opened 9 years ago

tokee commented 9 years ago

The first test of heuristic faceting is Dubious guesses, counted correctly. For the corpus used in the test, top-10 could be reliably determined using heuristic faceting, by requesting top-25 and only using the top-10 of the returned terms. This should be done under the hood, so asking for top-10 gives the correct top-10 (with high probability).

Looking at vanilla Solr distributed faceting, having a (configurable) factor and a constant with which to determine over-provisioning gives a great deal of flexibility.

tokee commented 9 years ago

This has been implemented in the 4_10_sparse branch but has not yet been tested.