tumblr / collins

groovy kind of love
tumblr.github.com/collins
Apache License 2.0
572 stars 99 forks source link

Tuning for Solr to improve indexing latency #529

Closed byxorna closed 7 years ago

byxorna commented 7 years ago

Fixes #526

This PR has a bunch of fixes for how solr behaves to improve indexing latency for assets after write.

1) Removes using explicit commit() calls to the Solr server, which reduces the load on solr, given the default 10ms window of batching asset updates (only a handful of assets are reindexed at a time at this interval during mass tag updates). 2) Increased assetBatchUpdateWindowMs from 10ms to 30ms to catch more assets in a single batch to add to solr. 3) Added a commitWithinMs tunable (default 50ms) which lets solr decide on the best time to commit updates to the index, instead of us forcing commits (of possibly very small batches of docs). This means that an asset will be available for search in assetBatchUpdateWindowMs+commitWithinMs=80ms default from update time. 4) Moves away from hard commits in solr, to using softAutoCommit. This means a document is searchable before solr performs an fsync on the index, reducing the IO load on solr and improving latency between doc update and searchability.

These tunables can be twiddled to balance throughput vs search after write latency for new attributes (i.e. collins.set_attribute!(tag, :foo, :bar), collins.find(foo: :bar)). Using the softAutoCommit feature will improve latency (but not entirely remove it!) when updating assets, then immediately searching for them.

@evanelias @bobpattersonjr @defect @roymarantz would love to hear any suggestions! This isnt a perfect fix for the jetpants issue, but its a start.

byxorna commented 7 years ago

@defect @roymarantz RFR again?

roymarantz commented 7 years ago

👍

byxorna commented 7 years ago

@michaeljs1990 wanna give this a once-over before i land?

byxorna commented 7 years ago

also @defect, :+1: ?

michaeljs1990 commented 7 years ago

:+1: on this. Curious about the jetpants issue mentioned in the summary though but didn't see any open issues about it on that repo.

byxorna commented 7 years ago

@michaeljs1990 if you set FOO=xyz on an asset, then immediately query for FOO=xyz, the solr index update may not yet be finalized (used for queries) from the write, so the query will return an "older" set of assets. A workaround is to either slow down how quickly jetpants writes/queries collins, or add some logic to set unique marker tags (or perhaps use modification time) when updating assets and retry queries for those marker tags until a non-empty set is returned.

@evanelias @roymarantz @defect @bobpattersonjr ill land this in a day unless i hear any vocal objections?

grahamc commented 7 years ago

I just updated the container used in some jetpants integration testing, and wanted to report this made a direct improvement.

To improve read after write consistency we've added a read-loop after each write, busy-waiting until we can read back what we just wrote. Even with this fix in place, sometimes we would update the secondary role on an asset from A to B and then search for assets where the secondary role is A, and the asset would be returned. After updating the container, under these circumstances with the RaW busy loop, the asset is no longer returned.

Thank you!

byxorna commented 7 years ago

@grahamc glad to hear it! This should be a good first (faux) step towards a truly consistent RW interface :)

komapa commented 7 years ago

Oh, I cc'ed @evanelias but it seems like @grahamc is already basking in the benefits of this :)