pat / thinking-sphinx

Sphinx/Manticore plugin for ActiveRecord/Rails
http://freelancing-gods.com/thinking-sphinx
MIT License
1.63k stars 469 forks source link

Max number of sidekiq threads for ts_delta indexer #1251

Open atomical opened 1 year ago

atomical commented 1 year ago

Hi Pat,

We have a lot of record updates coming in at the same time.

    using config file '/var/www/shared/config/qa.sphinx.conf'...
    indexing index 'schedule_delta'...
    FATAL: failed to lock /var/www/shared/db/sphinx/schedule_delta.tmp.spl: Resource temporarily unavailable, will not index. Try --rotate option.

It's followed by the worker exiting.

2023-08-15T20:59:11.182Z 3184402 TID-20bse WARN: SystemExit: exit
2023-08-15T20:59:11.193Z 3184402 TID-20bse WARN: /var/www/shared/bundle/ruby/3.1.0/gems/thinking-sphinx-5.4.0/lib/thinking_sphinx/commands/base.rb:41:in `exit'
/var/www/shared/bundle/ruby/3.1.0/gems/thinking-sphinx-5.4.0/lib/thinking_sphinx/commands/base.rb:41:in `handle_failure'

We would like to avoid setting the number of threads to 1. Currently it is at 5. Have you seen this before?

nsennickov commented 11 months ago

Hello @atomical :wave: I'm facing the same issue. I use Sidekiq as a worker for delta indexing, and my Sidekiq config includes usage of relatively new feature of Sidekiq capsules which I could configure to use only 1 thread at a time and make sure that the only one job is executed at a time. The problem though is that having a separate capsule for Delta indexing leads all the Delta indexing job to perform xxx times slower. Here is the data to compare:

And I run out of ideas how to handle it properly. I my case the job fails with:

tid=igxb class=ThinkingSphinx::Deltas::SidekiqDelta::DeltaJob jid=700414f64806a88d7d48fc0e WARN:   Sphinx  Guard file for index user_delta exists, not indexing: /bla/bla/blabla/shared/db/sphinx/production/ts-user_delta.tmp.

I'd appreciate any help figuring it out

pat commented 1 month ago

Hey folks, very slow response here, but I just wanted to provide something in reply to your messages.

And unfortunately, the short answer is: Sphinx can only have one process updating a given index at once. So I think the options are either:

Beyond that, I'm not sure how else to work around this (well, short of using real-time indices instead of SQL-backed indices, and thus avoiding the need for deltas at all).