wvanbergen / scoped_search

Easily search you ActiveRecord models with a simple query language that converts to SQL.
MIT License
265 stars 78 forks source link

Really odd error report from an innocuous search string #222

Open walterdavis opened 4 months ago

walterdavis commented 4 months ago

Environment:

I have a scoped_search set on a single column, and the user entered a long-ish query string:

action_dispatch.request.parameters: {"p"=>"every fact which has a beginning has a cause", "controller"=>"people", "action"=>"quoted_search"}

Which resulted in this error report:

A ScopedSearch::QueryNotSupported occurred in people#quoted_search:

 Field 'a' not recognized for searching!
 app/controllers/people_controller.rb:158:in `quoted_search'

That might be coming from https://github.com/wvanbergen/scoped_search/blob/master/lib/scoped_search/query_builder.rb#L476 or https://github.com/wvanbergen/scoped_search/blob/master/lib/scoped_search/query_builder.rb#L509, but I can't see why either would be relevant. Is there anything about the words in the request such that a might be interpreted as being a column name?

The model has this definition:

  scoped_search on: %i[search_name]

(search_name is just a downcased and simplified version of the name, created with parameterize.gsub('-', ' '))

The controller uses it here:

  def quoted_search
    @term = params.fetch(:p, '')
    @people = []
    @people = Person.quote_authors
                    .distinct
                    .search_for(@term).limit(40) if @term.length > 2
    render :quoted, formats: [ :turbo_stream ]
  end

I just can't see why Scoped Search was considering part of the query string as a mention of a column. There were no control characters or quotes in the input. Can anyone see something I have missed here?

walterdavis commented 4 months ago

Follow-up: running this in development, I can see that the error is coming from https://github.com/wvanbergen/scoped_search/blob/master/lib/scoped_search/query_builder.rb#L476 definitively. Screwing around with the input, I can determine that the breaking point is when the word has is followed by the word a. I wonder why that is being interpreted as a query statement, when it is coming in from the user.

walterdavis commented 4 months ago

I was able to work around this by adding a before_action to load and process the query term:

  def search_term
    @term = params.fetch(:p, '')
    @term = "\"#{@term}\"" if @term.match? /\s/
  end

I searched through the Wiki and read through a lot of the code, and while I was able to find the keywords in the Tokenizer module, I was not able to find any way to configure Scoped Search to not act on those. It seems determined to interpret has a as a literal command meaning table_name.a IS NOT NULL, when the user just meant the literal words.

Wrapping the user input as a phrase works around this, but it feels more as though the casual use of this gem would not be to start slinging SQL operations around in a text search field. Let that be an expert option that you enable by choice, so you know to explain to your users that they must know what all of the fieldnames are before they start using the common English word 'has' followed by literally any other word as their search term.

adamruzicka commented 4 months ago

You're mostly right about where this comes from. This gem provides a query language (and enables you to search using this language), has as a keyword is an unary operator in the language.

and while I was able to find the keywords in the Tokenizer module, I was not able to find any way to configure Scoped Search to not act on those

Correct, there's no way to disable it as that would mean disabling a part of the query language.

It seems determined to interpret has a as a literal command meaning table_name.a IS NOT NULL, when the user just meant the literal words.

Correct, has is an alias for set?. The user should be aware of the language, not just type in anything and hope that it will do what they mean.

Wrapping the user input as a phrase works around this

Just bear in mind that the semantics of searching for a b c and "a b c" are different. The first searches for anything containing ain any searchable field and containing b in any searchable field and containing c in any searchable field, while the latter searches for things containing a b c literally.

Let that be an expert option that you enable by choice, so you know to explain to your users that they must know what all of the fieldnames are before they start using the common English word 'has' followed by literally any other word as their search term.

If you really only want to search for exactly what the user passed in, then scoped_search might be too big of a hammer.

walterdavis commented 4 months ago

Thanks very much. I totally agree about the hammer bit. I reached for it because I was in a hurry, and I didn't want to be completely irresponsible by just allowing a SQL injection. I will do some more searching for a simpler solution. I really do like what you've built here, and can see it has great value where all of its features are needed.

I do realize that the phrase search is limiting the possible matches. Since the implementation here actually didn't anticipate someone entering an entire phrase (it was used for a search by name feature, triggered on keyup after the third character) the resulting error came out of left field from my perspective. But given an infinite universe of users, after four or five months someone decided to paste a phrase into the field...

Thanks very much for your feedback and for making this gem.

walterdavis commented 4 months ago

One more suggestion here: would it be possible to use ActiveRecord's column_names list to flavor which tokens are considered to mean that you are in the presence of an expert user, rather than an unknowing use of the keywords? That way has a would be skipped for evaluation as a to_null_sql target, rather than going down the error path (unless there actually was an a column available for evaluation), while has name would be interpreted as intending the logical operation. This could mean that instead of an error, you would get a working query. It might not give you the answer you sought, but it wouldn't just throw a 500 error.

One last thought: could there be a specific error that means "tried to search on a column, but there's no matching column"? That way an end-user could trap that error and decide what to do. The current QueryNotSupported is not unique to this case, and the only way to discern the source of the error state would be by filtering on the message text.

adamruzicka commented 4 months ago

One more suggestion here: would it be possible to use ActiveRecord's column_names list to flavor which tokens are considered to mean that you are in the presence of an expert user, rather than an unknowing use of the keywords?

In the situation where foo is a searchable column and bar is not, I'm a little uneasy about the idea of searching for has foo and has bar doing different things. How about instead of doing this, we'd rename the has operator to has? to signify it is something special? It would be more in line with other prefix unary operators (set? and null?) and hopefully it would be harder to trigger by accident.

One last thought: could there be a specific error that means "tried to search on a column, but there's no matching column"?

Sure, that's a good idea. In case you do stick with scoped_search in the end, would you be willing to take a stab at it?

walterdavis commented 4 months ago

Yes and yes. I will post here when I have something for either of these angles. I do like the gem very much.

Walter

On Jul 10, 2024, at 5:03 AM, Adam Růžička @.***> wrote:

One more suggestion here: would it be possible to use ActiveRecord's column_names list to flavor which tokens are considered to mean that you are in the presence of an expert user, rather than an unknowing use of the keywords?

In the situation where foo is a searchable column and bar is not, I'm a little uneasy about the idea of searching for has foo and has bar doing different things. How about instead of doing this, we'd rename the has operator to has? to signify it is something special? It would be more in line with other prefix unary operators (set? and null?) and hopefully it would be harder to trigger by accident.

One last thought: could there be a specific error that means "tried to search on a column, but there's no matching column"?

Sure, that's a good idea. In case you do stick with scoped_search in the end, would you be willing to take a stab at it?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

walterdavis commented 3 months ago

@adamruzicka could I get a little hint on the test setup? I'm trying to tack on to the existing query builder spec like so:

  context 'with interpreted null sql' do
    before do
      field = double('field')
      field.stub(:field).and_return(:test_field)
      field.stub(:key_field).and_return(nil)
      field.stub(:to_sql).and_return('')
      @definition.stub(:field_by_name).and_return(field)
    end
    it 'should be happy if the target exists' do
      ScopedSearch::QueryBuilder.build_query(@definition, 'has test_field').should eq(conditions: ['? IS NOT NULL', 'test_field'])
    end
  end

but I am getting

  1) ScopedSearch::QueryBuilder with interpreted null sql should be happy if the target exists
     Failure/Error: ScopedSearch::QueryBuilder.build_query(@definition, 'has test_field').should eq(conditions: ['? IS NOT NULL', 'test_field'])

       expected: {:conditions=>["? IS NOT NULL", "test_field"]}
            got: {:conditions=>["( IS NOT NULL)"]}
etc.

I've tried a number of different variations on this setup, with and without the ? placeholder, and while I can see that I am getting the correct AST type from the definition, nothing I do seems to give it the rhs value in the right place for it to be useful. (I always see a blank space in place of that value in the test output.) Do I need to mock that explicitly? Are there other tests that do? (Couldn't find any.)

I am certain this is down to the way I am setting up the stubs here, as I am just cribbing from the other tests in this file as best I can understand them. Any suggestions would be appreciated.

Walter

adamruzicka commented 3 months ago

Oof, this is tricky. I tried explaining that some time ago in here https://github.com/wvanbergen/scoped_search/pull/217#discussion_r1209441940 , maybe it will give you a bit of an insight about what's going on.

I am certain this is down to the way I am setting up the stubs here

Spot on, by stubbing to_sql on the field, we bypass part of the behaviour that sets up the placeholders and so on. We can lean into the dynamic nature of ruby and define the missing method ourselves, but i'll leave it up to you decide if such a test actually tests anything apart from itself.

  context 'with interpreted null sql' do
    before do
      field = double('field')
      field.stub(:field).and_return(:test_field)
      field.stub(:key_field).and_return(nil)
      def field.to_sql(builder)
        yield :parameter, builder.ast.rhs.value
        '?'
      end
      @definition.stub(:field_by_name).and_return(field)
    end
    it 'should be happy if the target exists' do
      ScopedSearch::QueryBuilder.build_query(@definition, 'has test_field').should eq(conditions: ['(? IS NOT NULL)', 'test_field'])
    end
  end

Another option would be having this with less stubbing as an integration test and then making assertions about the sql query that gets generated by calling klass.search_for("has something").to_sql

walterdavis commented 3 months ago

@adamruzicka when you have a chance, https://github.com/wvanbergen/scoped_search/pull/223 needs a button pressed from you. My first run through exposed a cross-db issue in my test, which I believe the latest push has resolved.