jurismarches / luqum

A lucene query parser generating ElasticSearch queries and more !
Other
188 stars 40 forks source link

Escape & invalid syntax #28

Closed fabiopedrosa closed 5 years ago

fabiopedrosa commented 6 years ago

I was under the impression luqum would be able to catch syntax issues, but is that not the case?

test_query = '''http://crazy.c'"om OR a"teste"'''
tree = parser.parse('content: ({})'.format(test_query))
print str(tree)
es_builder = ElasticsearchQueryBuilder(not_analyzed_fields=["published", "tag"])
query = es_builder(tree)
print query

just prints:

content:(http\:\/\/crazy.c'"om OR a"teste")

{'bool': {'should': [{'match': {'content.http\\': {'query': '\\/\\/crazy.c\'"om', 'zero_terms_query': 'none'}}}, {'match': {'content': {'query': 'a"teste"', 'zero_terms_query': 'none'}}}]}}

which is not accepted syntax for ES.

alexgarel commented 6 years ago

Hi @fabiopedrosa

So far we though we should better be tolerant, and let people check the parse tree if they want (there is a start of a something in luqum.check). This may enables more freedom in extending the language (which is one of the goal of luqum).

That said your case is of course a bit weird, and could be a good candidate to catch. If you want to fix that (with test) PR is welcome.

alexgarel commented 5 years ago

Just a wrap up, looking more closely.

The first problem is the column after http, which makes http being interpreted like a field you want to search in. This is perfectly normal, and we can't correct that.

The second, is that we ignore single and double quotes if they are in the middle of an expression, this is maybe where we are too liberal.

However for now I close. You can eventually reopen @fabiopedrosa