publiclab / plots2

a collaborative knowledge-exchange platform in Rails; we welcome first-time contributors! :balloon:
https://publiclab.org
GNU General Public License v3.0
956 stars 1.83k forks source link

Solr search performance issue with commit rate #1784

Closed jywarren closed 6 years ago

jywarren commented 6 years ago

When deploying #1537 we found this error on many requests, and overall site slowdown:

RSolr::Error::Http (RSolr::Error::Http - 503 Service Unavailable
Error: 'Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later.','code'=>503}}

Error explained here: https://stackoverflow.com/questions/7512945/how-to-fix-exceeded-limit-of-maxwarmingsearchers

@icarito says:

i.e. looks like we could not commit to SOLR and let Solr autocommit every few minutes

Let's investigate this. That'd mean we update the index, but don't say commit?

Also note: we've manually disabled solr in /config/sunspot.yml for now on production until we solve this issue.

jywarren commented 6 years ago

Hi, @icarito -- any updates on this? Thank you and hope you're well!

icarito commented 6 years ago

I've not yet tried to figure out when commits happen with Sunspot or how to batch them up or limit them. I'll explore this and update this issue once I can figure out / recommend a fix. Perhaps this is the same thing that is slowing down staging ( #1604 ).

jywarren commented 6 years ago

Thanks; yes, that could be, i think! Staging has solr enabled, right?

On Tue, Nov 21, 2017 at 2:25 PM, Sebastian Silva notifications@github.com wrote:

I've not yet tried to figure out when commits happen with Sunspot or how to batch them up or limit them. I'll explore this and update this issue once I can figure out / recommend a fix. Perhaps this is the same thing that is slowing down staging ( #1604 https://github.com/publiclab/plots2/issues/1604 ).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/publiclab/plots2/issues/1784#issuecomment-346134087, or mute the thread https://github.com/notifications/unsubscribe-auth/AABfJ1Gx56ik_FHDLStsewPtXvwPrfghks5s4yOTgaJpZM4QhXFE .

icarito commented 6 years ago

Yes solr is enabled in staging, following sunspot.yml production entry in stable branch.

On 21/11/17 16:08, Jeffrey Warren wrote:

Thanks; yes, that could be, i think! Staging has solr enabled, right?

On Tue, Nov 21, 2017 at 2:25 PM, Sebastian Silva notifications@github.com wrote:

I've not yet tried to figure out when commits happen with Sunspot or how to batch them up or limit them. I'll explore this and update this issue once I can figure out / recommend a fix. Perhaps this is the same thing that is slowing down staging ( #1604 https://github.com/publiclab/plots2/issues/1604 ).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub

https://github.com/publiclab/plots2/issues/1784#issuecomment-346134087, or mute the thread

https://github.com/notifications/unsubscribe-auth/AABfJ1Gx56ik_FHDLStsewPtXvwPrfghks5s4yOTgaJpZM4QhXFE .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/publiclab/plots2/issues/1784#issuecomment-346161451, or mute the thread https://github.com/notifications/unsubscribe-auth/AAMMS5AoPNmL6Z-Y-A0GzkPI4uzL3r9_ks5s4zvJgaJpZM4QhXFE.

icarito commented 6 years ago

I'm still trying to figure out how to tweak https://github.com/publiclab/plots2/blob/cf22ee7f002be15e8cf630e15da7687ff5a59a57/config/solr/solrconfig.xml correctly - Here's the manual for our version of Solr (5.3) which we chose to match the Sunspot Gem.

I'm not clear on what our commit strategy is so I'll try tweaks directly in staging by changing solrconfig.xml and restarting the Solr container. The problem is that it's hard to determine objectively when the issue has been reproduced / averted, can't think of a simple test.

Also this reference looks instructional - I'm concerned that the solution may have to involve some Ruby coding, @jywarren if that is the case I hope you can help me.

icarito commented 6 years ago

From the manual:

maxWarmingSearchers
This parameter sets the maximum number of searchers that may be warming up in the background at any given
time. Exceeding this limit will raise an error. For read-only slaves, a value of two is reasonable. Masters should
probably be set a little higher.
<maxWarmingSearchers>2</maxWarmingSearchers>

I'll try setting this to 4.

icarito commented 6 years ago

I just set maxWarmingSearchers to 4 and dropped / rebuilt the core. Now it's reindexing.

icarito commented 6 years ago

staging shows for homepage:

web_1   | Completed 200 OK in 1410.3ms (Views: 762.4ms | ActiveRecord: 645.0ms | Solr: 56246.0ms)
icarito commented 6 years ago

I'll add DEBUG flag to sunspot.yml.

jywarren commented 6 years ago

It seems pretty fast, at least, faster than before? On staging.

On Wed, Nov 22, 2017 at 12:24 PM, Sebastian Silva notifications@github.com wrote:

I'll add DEBUG flag to sunspot.yml.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/publiclab/plots2/issues/1784#issuecomment-346419357, or mute the thread https://github.com/notifications/unsubscribe-auth/AABfJ9PeAxFqHHVsvn86BvU5jKfV1LZLks5s5Fi-gaJpZM4QhXFE .

jywarren commented 6 years ago

Could we run a lot of requests in a script to simulate high commit usage, then see if we hit the limit?

jywarren commented 6 years ago

https://stackoverflow.com/questions/29522275/soft-commit-and-hard-commit-in-solr

https://blog.bigbinary.com/2012/10/11/solr-sunspot-websolr-delayed-job.html

https://stackoverflow.com/questions/17654266/solr-autocommit-vs-autosoftcommit#19682078

jywarren commented 6 years ago

Working on this here: https://github.com/publiclab/plots2/pull/1819

jywarren commented 6 years ago

Pretty stuck here -- any ideas or anyone know someone really familiar with optimizing Solr?

jywarren commented 6 years ago

@ujithaperera maybe?

jywarren commented 6 years ago

I found something interesting. On stable.publiclab.org this has been running fine without the error we saw on production. But I just tried posting a note at stable.publiclab.org/post, and that triggered the same 500 errors on all pages, with the same error we see at the top of this issue:

RSolr::Error::Http (RSolr::Error::Http - 503 Service Unavailable
Error: 'Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later.','code'=>503}}

So we can now reproduce the error in staging! This is likely because it creates a commit. I think we can probably reproduce this as a test, then? by adding a node creation to the Solr tests at: https://github.com/publiclab/plots2/blob/master/test/solr/

  test 'create a node' do
    # in testing, uid and id should be matched, although this is not yet true in production db
    node = Node.new(uid: users(:bob).id,
                    type: 'note',
                    title: 'My new node for node creation testing')
    assert node.save
  end
jywarren commented 6 years ago

Tried this in a PR at #1836

jywarren commented 6 years ago

BTW i'm disabling it on staging for now

jywarren commented 6 years ago

Oops i need to make a controller/functional test, not a unit test... changing...