jurismarches / luqum

A lucene query parser generating ElasticSearch queries and more !
Other
187 stars 42 forks source link

IPV6 parsing failure #30

Closed fritzb closed 6 years ago

fritzb commented 6 years ago

Latest ES6 support Ipv6. When I tried the following query, Luqum is unable to parse properly.

srcIp: 1::1

or

srcIp: 1\:\:1

Any suggestion?

alexgarel commented 6 years ago

Hi,

short answer: You have to put ipv6 addresses in quotes.

Luqum tries to follow queryString syntax (and provides you with tools to extend its meaning). I have done a quick experiment (see the gist) and in ES6 you'll have to quote ipv6, or use backslash to escape columns in it. So that's way luqum will go. (also it would clutter too much the parser to handle ipv6, while not providing any real benefit).

So I'll close this ticket, do not hesitate to reopen it if I missed something or you need more explanations.

fritzb commented 6 years ago

@alexgarel Could you try the following:

    >>> tree = parser.parse('srcIp: 1000\:1000\:1000\:1000')
    >>> print(repr(tree))
    SearchField('srcIp', SearchField('1000\', SearchField('1000\', SearchField('1000\', Word('1000')))))
    >>>

Please reopen the issue if you think this is a valid case.

alexgarel commented 6 years ago

@fritzb I think this is a valid case indeed for escaping is supposed to work !

fritzb commented 6 years ago

Another test case would be IPV6 with subnet, 1000:1000::1/24

alexgarel commented 6 years ago

I added two test and did not reproduce the behaviour, see : a32bd037f9776e076745c64dc98aa6b3f2b993bc

In fact in your comment above, if you want use a single \, you should use the r prefix to your strings. See raw string here https://docs.python.org/3.7/tutorial/introduction.html#strings

that is you should have written:

 >>> parser.parse(r'srcIp: 1000\:1000\:1000\:1000')
SearchField('srcIp', Word('1000\\:1000\\:1000\\:1000'))

or without the r modifier, you have to escape the escape:

 >>> parser.parse('srcIp: 1000\\:1000\\:1000\\:1000')
SearchField('srcIp', Word('1000\\:1000\\:1000\\:1000'))
fritzb commented 6 years ago

@alexgarel Looks like raw string escaped the \, as a result the string got modified:

    >>> str(parser.parse(r'srcIp: 1000\:1000\:1000\:1000'))
    'srcIp:1000\\:1000\\:1000\\:1000'
fritzb commented 6 years ago

Let me try to pass that string to ES. @alexgarel Another question: the above test didn't work with released luqum 0.73 from pip. It worked fine with git top of the tree. Perhaps, there were recent fixes? Any idea when will you release 0.74 ?

alexgarel commented 6 years ago

Released !

alexgarel commented 6 years ago

@fritzb

Looks like raw string escaped the \, as a result the string got modified

This is what the r marker is for ! But when python displays '\\' this really means the presence of only one \ in the string (but it escapes it, to mean it's not an escaped character !).

Think that '\n' means a newline character, while r'\n' means an \ followed by a n which can also be written '\\n'