jurismarches / luqum

A lucene query parser generating ElasticSearch queries and more !
Other
188 stars 40 forks source link

Parse fails on word commas (eg "hi , bye") #79

Closed cdrini closed 1 year ago

cdrini commented 1 year ago

For example:

from luqum.parser import parser
parser.parse('hi , bye')
# IllegalCharacterError: Illegal character ', bye' at position 3

parser.parse('hi, bye')
# UnknownOperation(Word('hi,'), Word('bye'))

Instead of an error, I expect to get:

UnknownOperation(Word('hi'), Word(','), Word('bye'))
alexgarel commented 1 year ago

@cdrini are you sure this is a valid Lucene query ?

cdrini commented 1 year ago

Hi @alexgarel ! Yep, I ran it in my solr instance and it treated it normally. Commas I don't believe are special characters anywhere in lucene; they shouldn't need escaping.

image

cdrini commented 1 year ago

(Btw this package is fantastic! I recently started using it on https://github.com/internetarchive/openlibrary and it's made everything soooo much easier and enabled a whole new class of features that were simply too complicated before! Thank you so much! 😊)

cdrini commented 1 year ago

Woohoo, that was fast ! Awesome, thank you @alexgarel ! And @mmoriniere !