Proposal: no more automatic callbacks inserted into models.

pat commented 4 years ago

Writing this out here to compose my thoughts, but perhaps those watching this repo could chime in with their thoughts? It's been sparked by the conversations in #1158 with @jdelStrother and #1166 with @lunaru.

Currently, Thinking Sphinx automatically inserts four callbacks into ActiveRecord::Base:

on before_save to set the delta flag on new/updated records.
on after_commit to run the delta indexing.
on after_update to update attributes in corresponding Sphinx records (a feature which is disabled by default).
on after_destroy to mark corresponding Sphinx records as deleted.

The first two of these only really need to be in place for models that are indexed with delta indexing enabled. The third should only be in place if attribute updates are enabled (and I'm guessing that's a very rare situation - only attributes get updated, not fields, so it's a stop-gap solution for anyone not using deltas or real-time indices). The fourth callback is useful for all indexed models.

But these callbacks are being invoked for all models, whether there's corresponding Sphinx indices or not. And sure, the overhead might be minor, but it's still far more involved than it should be.

So, given I'm gearing up to release Thinking Sphinx v5.0.0 - which is mostly just dropping support for old Ruby/Rails/Sphinx versions (see #1170) - I'm thinking this might be a good time to make all of these callbacks opt-in, much like the real-time callbacks are. This would also allow people to customise the callbacks more easily.

The usage could be along the lines of this… (methods/classes will almost certainly change):

class Article < ApplicationRecord
  # use defaults
  ThinkingSphinx::Callbacks.append(self, :deltas, :deletion)
  # could be the same as this
  ThinkingSphinx::Callbacks.append(self, :defaults, :deltas)
  # because knowing about 'deletion' feels too specific, but deltas is more of a known quantity.
end

class User < ApplicationRecord
  # or if customising behaviour...
  before_save :set_sphinx_delta_flag

  private

  def set_sphinx_delta_flag
    return if skipping_deltas? # custom logic

    # invoke TS's default callback
    ThinkingSphinx::Callbacks::Deltas.before_save(self)
  end
end

I have looked into whether we could still load the callbacks automatically, but only for the impacted models - but it's hard, not just because different callbacks are used in different index types (real-time vs SQL-backed vs SQL-backed-with-deltas), but also because things like STI get in the way - we can't just match index references to model names.

And I think I prefer the more explicit approach instead. Yes, it means using Thinking Sphinx requires a touch more work, but the points of interaction are more clear. And the bigger win being that we avoid adding callbacks to all the models that aren't indexed by Sphinx. I'm wishing I'd taken this approach to begin with, really, but the questions are: do people think this is worthwhile? And are you all okay with having to explicitly add callbacks?

jdelStrother commented 4 years ago

I'd be in favour of making people use explicit callbacks, but it's possible we're an edge-case.

We used to use deltas, but found that the delta updates became too heavyweight when working with tables with millions of rows.

We now rely entirely on asynchronous realtime callbacks - when a sphinx-indexed ActiveRecord model is updated, it gets pushed into a redis queue. A background job is continuously working through that queue to send updates to Sphinx.

Given all that, I think the only built-in Thinking Sphinx callback we use is the after_destroy one, so making them opt-in would work great for us...

lunaru commented 4 years ago

I concur with this change. There are some performance benefits as already pointed out, but even in a simple use case, having explicit callbacks help with new ThinkingSphinx users in understanding the concept of deltas without relying heavily on "magic". It's actually pretty important to understand why and when delta callbacks are triggered and what they do.

I think it's beneficial to have users exposed to this explicitly early on even at the cost of extra burden in needing more edits on existing code bases.

I think of it in this way -- the implementation and operation of TS/Sphinx is roughly across 3 major parts:

Creating the index -- this is explicit. Users need to define their indices.
Searching the index -- this is also explicit. Users need to make their own Model.search calls.
Maintaining the index -- this right now is too implicit and having it be explicit allows for better understanding and finer controllers. This is especially true since this part of TS/Sphinx is actually the part that requires the most fine-tuned adjustments as code bases grow and tables scale to higher volumes.

pat commented 4 years ago

Thank you both for the feedback :)

I've just pushed a commit to the rework-callbacks branch, which introduces the following syntax:

class Article < ApplicationRecord
  # if you're using SQL-backed indices:
  ThinkingSphinx::Callbacks.append(self, :behaviours => [:sql])

  # if you're using SQL-backed indices with deltas:
  ThinkingSphinx::Callbacks.append(self, :behaviours => [:sql, :deltas])

  # if you're using real-time indices
  ThinkingSphinx::Callbacks.append(self, :behaviours => [:real_time])
  # this replaces:
  # after_save ThinkingSphinx::RealTime.callback_for(:article)

  # if you're using real-time indices with namespaced models:
  ThinkingSphinx::Callbacks.append(self, 'admin/article', :behaviours => [:real_time])
  # this replaces:
  # after_save ThinkingSphinx::RealTime.callback_for('admin/article')

  # if you're firing real-time callbacks for associated models,
  # e.g. from within a Comment model which belongs_to :article:
  ThinkingSphinx::Callbacks.append(self, :behaviours => [:real_time], :path => [:article])
  # this replaces:
  # after_save ThinkingSphinx::RealTime.callback_for(:article, [:article])

  # Blocks passed to the `.append` call are treated like blocks passed to
  # the `.callback_for` call, as per existing behaviour for real-time indices:
  # https://freelancing-gods.com/thinking-sphinx/v4/indexing.html#callbacks

  # If anyone's got the `attribute_updates` setting enabled in their
  # config/thinking_sphinx.yml file, they'll want to include the callbacks for
  # that as well:
  ThinkingSphinx::Callbacks.append(self, :behaviours => [:sql, :updates])
  # Though given this feature isn't enabled by default, I suspect not many
  # people will need to do this.
end

It didn't actually require too many changes under the hood, just a different way of attaching things, which makes me pretty confident there won't be many adverse side-effects. 🤞 We'll see what the CI build says, but certainly if you've any thoughts about this, do let me know!

pat commented 4 years ago

Closing this issue, because v5.0.0 is now released with this new approach to callbacks - thanks again to both of you for your feedback!

lunaru commented 4 years ago

@pat that's great news! We've been using it for the past few weeks pointing directly to the commit and it's been going well. Looking forward to switching back to a release branch.

pat / thinking-sphinx

Proposal: no more automatic callbacks inserted into models. #1173