tokee / lucene-solr

High cardinality faceting (SOLR-5894)
http://tokee.github.io/lucene-solr/
7 stars 1 forks source link

Speed up ordinal mapping #50

Open tokee opened 9 years ago

tokee commented 9 years ago

Currently the entries in the faceting counters are 1:1 with global term ordinals. In the counting phase, segment-local ordinals are mapped to global and the corresponding counter increased: counter.inc(getGlobalOrdinal(localOrdinal)). This is a performance problem as the same local ordinal might occur many times in the same segment, each time triggering a lookup for its global ordinal.

It would be faster to calculate all the local counts first and then map them to global. Vanilla Solr does this, but sparse does not. Unfortunately this is very memory-costly for segments with a high number of unique values in the facet, so again the strategy-pattern should be used to select whether or not to do the mapping up front or delayed.

Note: A fully optimized index does not have this problem as the getGlobalOrdinal in that case is an extremely fast check for null.

dsmiley commented 9 years ago

Why both; is the current strategy (map to global up-front) ever faster? If there aren't many unique values, it probably doesn't matter what you do.

tokee commented 9 years ago

The up-front takes less memory (relevant for many uniques/segment) and is better with sparse sets (although one could make the temporary counter structure sparse too).

The speed difference between the two strategies scale primarily with the number of references from docs to values, not so much with the number of unique values.