opensearch-project / opensearch-php

Official PHP Client for OpenSearch
Other
109 stars 58 forks source link

Adds a query sanitization helper function #106

Closed dacgray closed 1 year ago

dacgray commented 1 year ago

Couldn't find a common function to sanitize untrusted input. Hoping this will be useful.

shyim commented 1 year ago

Hey,

What is the use case for that kind function? Do you have an example?

dacgray commented 1 year ago

Sanitizing untrusted search query input against reserved chars.

EG, if a customer searches ABC* we don't want to process the wildcard:

Not escaped: image

Escaped: image

dblock commented 1 year ago

Looks like this is a query sanitizer. I think this is very specific to your use-case @dacgray, and belongs in your application, I've seen other examples where passing through * is completely valid. Furthermore I think you don't want to just swallow search terms from users, but to fail visibly if someone types something that you cannot search for, or users will get confused by the results.

I propose we close this.

dacgray commented 1 year ago

No worries ^^

The argument for is that it would be helpful and safer to have a common, accepted way to handle reserved chars. It's easy to make a very expensive mistake here. We may want user text search queries to process reserved chars, we may want to fail visibly if there are reserved chars, and as in my case we may want to just escape them.

Could change to:

SanitizationHelper
public static function escapeReservedChars(string $string): string
public static function hasReservedChar(string $string): bool
shyim commented 1 year ago

For me, it looks like you are using the wrong query for your user input. When the user should not know how the Query syntax is, you should use more one of the normal Full-text queries where you can pass your user input without doing anything with it.

https://opensearch.org/docs/latest/opensearch/query-dsl/full-text/#match

Here is an example how we do it in our application: https://github.com/shopware/platform/blob/trunk/src/Elasticsearch/Framework/AbstractElasticsearchDefinition.php#L64-L69

(We save on each document a fullText and fullTextBoosted property with all the search words and search afterwards in it. Words in boosted field have a better ranking)

(It uses https://github.com/shyim/opensearch-php-dsl to generate the query by using PHP objects)

dacgray commented 1 year ago

Alright - that makes sense.

Thanks for your suggestion!