tokee / lucene-solr

High cardinality faceting (SOLR-5894)
http://tokee.github.io/lucene-solr/
7 stars 1 forks source link

Prune facet results server side #26

Closed tokee closed 9 years ago

tokee commented 9 years ago

Faceting on outgoing links in netarchive.dk results in a lot of garbage and self-linking in the result. We should add an optional pruning step. This could be regexp-based, so for example facet.f.links.prune=http://[^/]*example.com/ would ensure that no links to example.com are returned.

tokee commented 9 years ago

This has been implemented as blacklists & whitelists in the very experimental branch pack. It is likely to soon be merged into the 4.8-sparse & 4.9-sparse branches.

See https://sbdevel.wordpress.com/2015/04/10/facet-filtering/ for details.

tokee commented 9 years ago

The implementation creates new Matchers for each check. It would probably be faster to reuse the matchers by resetting them with the new input.

tokee commented 9 years ago

Closed, although the feature only works for DocValues.

A shift to Solr 4.10.x or 5.x is anticipated in the near future, where the non-DocValued String faceting implementations are presented similar to DocValues in the API.