Closed deltamualpha closed 10 years ago
I think its fair that we build some safe guards in place. Is there some best practices for sanitizing data for elasticsearch?
There's nothing in particular in the Elasticsearch docs about this (I looked). http://elasticsearch-users.115913.n3.nabble.com/Illegal-Characters-td1407302.html suggests that the same invalid characters as Lucerne should be escaped in a query (see also http://elasticsearch-users.115913.n3.nabble.com/Exclamation-point-breaks-field-queries-td3678504.html.)
Where would it make sense to do this work? src/elasticsearch/Searcher.php?
yep
Note that http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#_reserved_characters now exists. Working on implementing.
after reviewing that link, I dont think blindly escaping is appropriate. if I want to use boolen searches, escaping would turn it off completely. perhaps it could be an option to auto escape in the admin? or maybe just have more complicated regex rules...
I had asked a question related to this in #48, but figured I'd open it up a little more now: should it be the business of fantastic-elasticsearch to prevent things it 'knows' are invalid queries from being sent to the elasticsearch server? I see searches for things like " " and "faaad!!!!!!!!!!!!!!!!!!!@@@@@@@@@@@@@@@######################$#$$$$Q%%%%%%%%%%%^^^^^^^^^" (unfocused attack attempt?) in my elasticsearch error logs, and worry both that there's an opening here for carefully-crafted requests as well as simple performance degradation on the elasticsearch side. (The errors are Unexpected EOF and "Encountered " "! "", respectively.)