mrkamel / search_cop

Search engine like fulltext query support for ActiveRecord
MIT License
825 stars 39 forks source link

Feedback requested: Pass through options to scope block #75

Open bmulholland opened 2 years ago

bmulholland commented 2 years ago

Soliciting feedback on this feature, hence draft. Our search is scoped per-user, and so the search scope must include conditions that the data belongs to this user.

Basic scenario is to add these to a where that's part of the scope. An error-prone workaround to that is to apply the where after the search, outside of search_cop

More advanced scenario is building custom subqueries to optimize complex joins for searching. For those, the where clause cannot be built outside of the scope because it is necessarily part of the scope (think .joins(Model.join(...).where().to_sql). In those scenarios, this feature is required to use the scope at all -- and possibly to use search_cop at all.

Thoughts on this approach? Other ideas? Is this feature welcomed? What else (tests, I assume) would need to be done to get this merged in?

mrkamel commented 2 years ago

Hi, thanks for opening this! However, i think i don't fully understand what you try to achieve. If you want to scope your search queries to a particular user, doing it outside of search cop is definitely the way to go. You have a thousand options to do so. Otherwise please describe again why you think this is "error prone" and also please describe how you want the search cop DSL to look like with your desired approach. Thanks in advance!

bmulholland commented 2 years ago

If you want to scope your search queries to a particular user, doing it outside of search cop is definitely the way to go.

Honest question: is it? Why?

Here's my thoughts:

  1. There's an argument called "scope" for each search_cop. Making it user-specific is, literally, "scoping" -- IMHO more so deserving of the name "scope" than what the currently-named "scope" is actually intended for, which is doing joins and preloading.
  2. Without the where conditions in the search declaration, the where needs to be applied anywhere this is used. Meaning: you can't just write search(input), you have to copy and paste the search().where(....) from somewhere else.
  3. This means the where conditions are in a very different place from the declaration of any associations. Any changes to the associations in the search_scope scope could require a change to all the where statements, meaning we'd need to go through all uses of the code and change the where everywhere. Also, we'd need to remember to do this, etc etc. Hence: error prone.
  4. We could write a basic wrapper around search(), but that's exactly what scope is already doing, so why not use it?
  5. As I say, in more complex scenarios, it is a requirement to have the user information. In writing this, I'm taking a break from a bugfix that requires exactly this: we search two optional joins, and there's a bug that can only be fixed by changing the LEFT JOINS to include user_id in the ON. (I can go into detail here if you care.) Having user_id is similarly helpful when optimizing search with subqueries.
  6. scope must be used for setting up the associations; otherwise, search_cop tries to generate a preloading scope based on associations that don't exist, throwing errors. (ref: https://github.com/mrkamel/search_cop/blob/master/lib/search_cop.rb#L67) Plus, my understanding is that the intent is that scope is how associations are set up, so if extra information is necessary to set that up, then doing so follows the intent of the configuration.
  7. I tried faking that out with a scope { self } but that didn't work -- maybe I messed that up.
  8. Even if I could get that working, pulling all the association work out of the search_cop definition is doing even more of what I write above.
  9. More so than where, the associations are coupled to the attributes declared as searchable by search_cop. For example, changing a joins could likely mean that table name changes, which means the name of the searchable attribute changes. It's convenient to have all this in one place, without having to juggle multiple files for the scope, attribute names, etc.

I'm both sick and tired, so please let me know if any of this isn't making sense.

In terms of API, I don't really care. All I'm looking for is a central place for all of this coupled code, that works, and that allows me to fix bugs and delivery features that require associations that require user information. Some options would be search(query, scope_options: user) or search(query, scope_options: {user:}) or search(query, user) or search(query, for: user)

mrkamel commented 2 years ago

Thanks for the exhaustive answer.

Regarding "why outside of search cop": similar to unix philsophy, do one thing and do it good, all the scoping activerecord is already perfectly good at.

Regarding "the meaning of scope": A search scope in search cop terminology is the set of fields the search is scoped to.

Besides, before i try to cover each point individually please tell me what would be wrong with:

class MyModel
  # ...

  search_scope :internal_search do
     # ...
  end

  def self.search(q, user:)
    where(user: user).internal_search(q)
  end
end

Again, unix philosophy, do one thing good and then mix together the good things. Moreover, everything is at the same place. You can add all neccessary joins, etc. It even matches the same interface if done right.

bmulholland commented 2 years ago

I'm trying to understand what you're saying vs what the README says. Specifically, from the README, "If you [...] need to perform special operations, specify a scope." "You can as well use scope together with aliases to perform arbitrarily complex joins and search in the joined models/tables." This is what I'm trying to do, it's just that my complex join requires access to the user ID (pretty arbitrary, right? hah :) ).

In this thread, I believe you're actually making an argument that search_cop should not have a scope method at all. That would be the unix philosophy, leaving the scope part to ActiveRecord and out of the search_cop's responsibility. Specifically, in the example code you include, the self.search method is replacing the scope, with the only difference that your specific example takes in the user argument.

Am I understanding you correctly that you'd actually like to remove the scope from the gem? If not, how does your argument apply specifically to the case when additional data is required (e.g. user ID) vs the currently-supported scope, without any args? Aligning on this part will help me understand the rest of your thoughts, including how to respond to the code snippet you included.

Thanks for the feedback and discussion!

bmulholland commented 1 year ago

Hey -- following up on this :). Would love to get off our custom fork, but do need a way to pass in the user to the search scope. I spent days debugging that and it's the only way to get a performant search for our product. Is there anything I could do to move it along?

mrkamel commented 1 year ago

hey, @bmulholland thanks for the heads up. Maybe you can share some actual code which shows how you use this. I have a bit of a hard time following. Thanks in advance.

bmulholland commented 1 year ago

Sure!

Here's a very simplified version of the configuration, including the detailed comments I made about it:

  search_scope :search do
    attributes name: ["records.title", "user_records.title"]

    scope do |user|
      self
        # user_records are the user's manually set name for
        # the record. Note that this will search ALL user's names
        # for this record unless additional scope is put to limit
        # the search to where user_records.user is the current user
        # or nil (for when they haven't renamed the item)
        #
        # WARNING: MUST use user_id in the join condition -- otherwise the left
        # join won't work as intended. We want blank rows for on the right of
        # the joins if there are no matches for this user, and without the
        # user_id constraint then that won't happen because there are results
        # for *other* users that fill in the gaps. Then those users would get
        # filtered out by the where condition, resulting in missed results.
        #
        # NOTE: By manually writing the joins and including user_id and
        # tenant_id, we hint to Postgres that it can efficiently do the join
        # without checking tenant_id. This shaves off 10s for some joins!
        # Some detail about this approach here: https://towardsdatascience.com/how-we-optimized-postgresql-queries-100x-ff52555eabe
        .joins("LEFT OUTER JOIN user_records ON "\
               "user_records.record_id = records.id AND "\
               "user_records.tenant_id = records.tenant_id AND "\
               "user_records.user_id = #{user.id}")
        # These are filtered in the `where` -- though...
        # TOOD: Try filtering the table_for_permissions join too ON user_id
        .joins(:table_for_permission)
        .where("table_for_permission.user": user)
    end
  end

Usage is: Record.search(filter_text, {}, user)

mrkamel commented 1 year ago

Thanks! this made things much clearer. Generally, i'd then be ok with passing those options to the scope as shown in your proposal. I also think we could change it to keyword arguments without introducing a breaking change, as query_options currently should only contain :default_operator, such that we might be able to change it to

send(:define_singleton_method, name) { |query, default_operator: nil, scope_options: nil| ... }