pat / thinking-sphinx

Sphinx/Manticore plugin for ActiveRecord/Rails
http://freelancing-gods.com/thinking-sphinx
MIT License
1.63k stars 469 forks source link

after_commit should delete instances out of scope (soft-deleting) #1216

Closed akostadinov closed 2 years ago

akostadinov commented 2 years ago

In #1057 a use case for soft-deleting documents is presented. In my project soft-deleting is implemented in a different way (state machine). But I think it is possible to cover all cases. Just not sure about the performance impact.

The idea is to check whether object is part of index scope in the after_commit callback. If it is not, then delete it instead of indexing it.

The approach can be something like:

      if index.scope.find_by(primary_key => id).exists?
        reindex(index, self)
      else
        delete_from_index(index, id)
      end
pat commented 2 years ago

Hi Aleksander

Sorry for not responding sooner to this. I'm not quite sure if I understand your question properly, so let me confirm:

You are wanting to mark records as deleted within Sphinx indices when your records are in certain 'soft-deleted' states - and then when those records are later updated, you want to ensure the record is still kept out of Sphinx (unless it's no longer considered 'soft-deleted'). Is that correct?

If so, I would recommend looking at the state of your record rather than querying against Sphinx (or your database) to determine whether it should be updated vs deleted. Is that possible in your situation?

akostadinov commented 2 years ago

It is indeed possible to look at object state and decide that. But it requires handling models on a case by case basis.

We define a scope for the index anyway. On indexing no objects out of scope are selected. This makes it a universal way to check whether an object needs to be in the index or removed from it.

In my particular case we do that processing in a background job (#1215) so a database query is necessary anyway.

I see that it introduces some performance penalty for simple use cases, on the other hand it may save server RAM and I expect queries on the primary key to be fast. I don't know how often and with what use cases one approach or the other will have a noticeable difference. I only know it affects us :)

pat commented 2 years ago

If the scoped Sphinx query approach is working for you, wonderful :)

I guess I would have looked into having a method with the same name in all indexed models which determines whether a record should be indexed or not, and then call that - to save the search query - but not a big deal either way.

Are there any questions remaining for this issue? Or can it be closed?

akostadinov commented 2 years ago

Well, the idea is to make this a default behavior - remove indexed models that are out of scope when scope is provided.

But if you don't want such behavior, then you can close I guess. Thank you for the attention in either case.

pat commented 2 years ago

Coming back to this after a few months, I don't think I'm going to make this change, but appreciate the suggestion. Certainly if you or others find more reasons for it, I'm happy to discuss further!