inveniosoftware / troubleshooting

DEPRECATED - Use the forum instead:
https://invenio-talk.web.cern.ch
5 stars 4 forks source link

Sanitise the ES query string #43

Open qburst-jineesh opened 5 years ago

qburst-jineesh commented 5 years ago

Do we have to sanitize the query string to prevent query ingestion in Elasticsearch?

lnielsen commented 5 years ago

Can you elaborate a bit on what are the dangerous parts for Elasticsearch? Currently, the query string is passed to ES for parsing. So e.g. you cannot have fields in ES that are not searchable. For that you'll need to parse the query string in Invenio using something like the https://pypi.org/project/luqum/ package

qburst-jineesh commented 5 years ago

@lnielsen Could you please provide clarifications for the two scenarios mentioned below,

  1. Special characters will be considered as part of the search keywords provided or will they give special meaning to the query? for eg: if I search for the keyword * will I get all the records(irrespective of whether it contains or not) or only the records contains ``
    1. AND, OR and NOT will be considered as the logical operator or the just like other normal words in the serach keyword?

If the result is based on the special meaning (* means all records, AND means Logical AND) of the special characters or operators rather than considering as normal keywords we have to handle the query string in our side.So, please confirm

lnielsen commented 5 years ago

If you use the built-in elasticsearch query parser (the default for Invenio), then Elasticsearch parses your query string according to the this query string syntax.

From the documentation you'll see that:

You can see more examples of what is possible with the query syntax here: http://help.zenodo.org/guides/search/