tumblr / collins

groovy kind of love
tumblr.github.com/collins
Apache License 2.0
572 stars 99 forks source link

Tune solr for realtime searching #526

Closed byxorna closed 7 years ago

byxorna commented 7 years ago

Solr is pretty much the default configuration, which causes a bunch of issues at scale, like indexing falling behind and not reflecting changes to assets in searches.

1) Solr indexers fall behind on doc update storms (i.e. set attribute on every asset) 2) commits must be written to disk before being searchable, increasing lag between write and read visibility 3) asset updates may be triggering a manual commit instead of leveraging autocommit to properly batch commits

We should tune for improved autocommit, and look at softautocommit as well. Perhaps using the commitWithin solr facility may also provide some relief.

https://wiki.apache.org/solr/SolrConfigXml?#Update_Handler_Section https://wiki.apache.org/solr/NearRealtimeSearch