pat / thinking-sphinx

Sphinx/Manticore plugin for ActiveRecord/Rails
http://freelancing-gods.com/thinking-sphinx
MIT License
1.63k stars 470 forks source link

Deploys and shutting down ts_delta Sidekiq worker #1123

Closed atomical closed 5 years ago

atomical commented 5 years ago

Hi Pat,

We have a delta index that takes a while to rebuild because of a large number of new records between deploys. During our deploy we quiet the Sidekiq workers and then kill them off. I'm thinking that our issue is that the indexer hasn't finished with the delta index before we kill it. Thoughts?

pat commented 5 years ago

If the delta jobs are slow, yeah, I'd expect it to be because there's a stack of records to process.

How often are you running a full ts:index? And just to check: are there any foreign keys in the delta query that don't have database indices? (i.e. can the query be optimised to improve processing times)

atomical commented 5 years ago

We're doing it every night now. We've added all the indexes we can. Our data set has grown pretty large.

pat commented 5 years ago

I'm curious as to how large your dataset is?

And it might be possible to shard your data across many Sphinx indices, which could help indexing times, at least for deltas? Perhaps by the year of the created_at column? Or ids per million? … though, maybe something less time-based is better, so the delta load is shared across multiple indices rather than the latest one.

If you go down that path, then it might be possible to alter the callback behaviour and only fire off delta processing jobs for the appropriate delta that record's tied to, instead of all deltas for that model.