Closed dacgray closed 1 year ago
Hey,
What is the use case for that kind function? Do you have an example?
Sanitizing untrusted search query input against reserved chars.
EG, if a customer searches ABC*
we don't want to process the wildcard:
Not escaped:
Escaped:
Looks like this is a query sanitizer. I think this is very specific to your use-case @dacgray, and belongs in your application, I've seen other examples where passing through *
is completely valid. Furthermore I think you don't want to just swallow search terms from users, but to fail visibly if someone types something that you cannot search for, or users will get confused by the results.
I propose we close this.
No worries ^^
The argument for is that it would be helpful and safer to have a common, accepted way to handle reserved chars. It's easy to make a very expensive mistake here. We may want user text search queries to process reserved chars, we may want to fail visibly if there are reserved chars, and as in my case we may want to just escape them.
Could change to:
SanitizationHelper
public static function escapeReservedChars(string $string): string
public static function hasReservedChar(string $string): bool
For me, it looks like you are using the wrong query for your user input. When the user should not know how the Query syntax is, you should use more one of the normal Full-text queries where you can pass your user input without doing anything with it.
https://opensearch.org/docs/latest/opensearch/query-dsl/full-text/#match
Here is an example how we do it in our application: https://github.com/shopware/platform/blob/trunk/src/Elasticsearch/Framework/AbstractElasticsearchDefinition.php#L64-L69
(We save on each document a fullText
and fullTextBoosted
property with all the search words and search afterwards in it. Words in boosted field have a better ranking)
(It uses https://github.com/shyim/opensearch-php-dsl to generate the query by using PHP objects)
Alright - that makes sense.
Thanks for your suggestion!
Couldn't find a common function to sanitize untrusted input. Hoping this will be useful.