Closed lunaru closed 4 years ago
Hi @lunaru
This is a great point, and you're right that the existing solutions aren't ideal. I was thinking about whether a custom IndexSet class could help, or a subclass of ThinkingSphinx::Deltas::SidekiqDelta
, but neither is going to provide the ability to filter indices based on the current model instance, just the model.
If there are ideas for PRs, please do share, because I'd love your experience (especially given I'm not using a sharding approach for any of my apps). I'm currently wondering if changing IndexSet filtering to accept an instance (much like it accepts a model class, see line 46 of that delta callback file), and then that would allow picking and choosing specific shards based on that instance. Other ideas are welcome too!
Also: I guess another way is to use real-time indices rather than deltas, and then you've got a touch more control over the callbacks in the model, and can invoke them a little more specifically. But that is a significant change if you're comfortable with SQL-backed indices.
@pat thanks for sharing your thoughts. The idea that we have to solve this would be to allow the model to optionally implement ts_index_names
.
Each of the Callback classes would then check instance.respond_to? :ts_index_names
and if it does, it will select
the indices
against those names to narrow the indices that get update/delete/etc calls.
If you think this approach is agreeable, we'll take a stab at the PR ASAP.
Hi @lunaru - sorry for the delay in getting back to you, but I've just merged some changes into the develop
branch that should help solve this issue: I've reworked IndexSet so it accepts specific instances, and that's what the callbacks pass through. Which means that something like the following should help:
class IndexSet < ThinkingSphinx::IndexSet
private
def indices
return super if instances.empty?
super.select do |index|
# this block below will return all core and delta indices for the
# instance's model, filtered by the shard id for the instance.
instances.any? { |instance| index.name[/_#{instance.shard_id}_$/] }
end
end
end
# This line would go in an initialiser:
ThinkingSphinx::Configuration.instance.index_set_class = IndexSet
The shard_id
method in the example could be replaced with your MOD
logic, and thus returns the integer of the index an instance is tied to. If you'd like to give this a shot and confirm if it works for you, that'd be great! Then I can look into getting a new gem release published.
@pat awesome, we’ll give this a try this week (hopefully Monday) and let you know. I take it that your refactor is already live on the master
branch and we’d just need to pin our gemfile to master
to use it?
master
is for released gem versions, so it's only in develop
for now, but that's considered stable so there shouldn't be any issues using that branch as the source.
@pat Good news: It looks like this works brilliantly. The only difference with the index_set_class
call had to be done differently because it looks like the configuration was getting wiped upon initialization. Here's how we ended up doing it:
class ShardedIndexSet < ThinkingSphinx::IndexSet
private
def indices
return super if instances.empty?
super.select do |index|
# this block below will return all core and delta indices for the
# instance's model, filtered by the shard id for the instance.
instances.any? { |instance| !instance.respond_to?(:shard_id) || instance.shard_id.blank? || index.name[/_#{instance.shard_id}_/] }
end
end
end
# this needs to be set after initializers for the thinking-sphinx gem are run
Rails.application.config.to_prepare do
ThinkingSphinx::Configuration.instance.index_set_class = ShardedIndexSet
end
Specifically Rails.application.config.to_prepare
runs late enough to let us override the default index set class.
I'll go ahead and close this issue and await the next release version
Thanks for the confirmation, and the note about the initialiser. I'll try to get the new gem release out soon!
I know I've already noted this in #1173, but wanted also to mention that the newly released v5.0 includes the change discussed here with instances
being available in custom IndexSet subclasses :)
@pat I wanted to create this issue to start this discussion to make sure we've exhausted all of the documented solutions.
Right now, we have a model with enough records that a single index is not practical, so instead, we have something like this:
However, this as the problem that when a record is modified, it ends up creating N delta jobs, one for each index. If N is large enough, this becomes a problem.
It looks like the offending code is here: https://github.com/pat/thinking-sphinx/blob/1adcf8159fdbcb96a70fb01f5d8a15c2070090e6/lib/thinking_sphinx/active_record/callbacks/delta_callbacks.rb#L16
So the question is: Is there a better way to do sharding that avoids this landmine? Or conversely, if this is the best way to shard, then would a PR to help solve this problem be welcome?