jurismarches / luqum

A lucene query parser generating ElasticSearch queries and more !
Other
188 stars 40 forks source link

Parsing error in multithreading #72

Closed guozizi closed 1 year ago

guozizi commented 2 years ago
import _thread

from luqum.parser import parser

def run():
    qs1 = '(title:"foo bar" AND body:"quick fox") OR title:fox AND (title:"foo bar" AND body:"quick fox") OR ' \
          'title:fox AND (title:"foo bar" AND body:"quick fox") OR title:fox AND (title:"foo bar" AND body:"quick ' \
          'fox") OR title:fox AND (title:"foo bar" AND body:"quick fox") OR title:fox'
    qs2 = '(title:"foo bar" AND body:"quick fox") OR title:fox AND (title:"foo bar" AND body:"quick fox") OR ' \
          'title:fox AND (title:"foo bar" AND body:"quick fox") OR title:fox AND (title:"foo bar" AND body:"quick ' \
          'fox") OR title:fox AND (title:"foo bar" AND body:"quick fox") OR title:fox'

    parser.parse(qs1)
    parser.parse(qs2)

# The larger the range, the more likely it is
for i in range(100):
    _thread.start_new_thread(run, ())

# The single thread works properly
# for i in range(1000):
#     run()

raise error: luqum.exceptions.ParseSyntaxError: Syntax error in input : unexpected end of expression (maybe due to unmatched parenthesis) at the end!

alexgarel commented 2 years ago

Yes, sadly error messages are not that good, but it would take a certain amount of work to have it right (PLY does not help much in that).

alexgarel commented 1 year ago

Oh sorry @guozizi, yes PLY is not thread safe … see https://github.com/dabeaz/ply/issues/268

alexgarel commented 1 year ago

See also this comment: https://github.com/dabeaz/ply/blob/master/ply/yacc.py#L42

I tried a copy.deepcopy of the parser, but it did not seems to resolve the problem…

alexgarel commented 1 year ago

@guozizi this works !

import _thread

from luqum.parser import parser
from ply import lex

def run():
    qs1 = '(title:"foo bar" AND body:"quick fox") OR title:fox AND (title:"foo bar" AND body:"quick fox") OR ' \
          'title:fox AND (title:"foo bar" AND body:"quick fox") OR title:fox AND (title:"foo bar" AND body:"quick ' \
          'fox") OR title:fox AND (title:"foo bar" AND body:"quick fox") OR title:fox'
    qs2 = '(title:"foo bar" AND body:"quick fox") OR title:fox AND (title:"foo bar" AND body:"quick fox") OR ' \
          'title:fox AND (title:"foo bar" AND body:"quick fox") OR title:fox AND (title:"foo bar" AND body:"quick ' \
          'fox") OR title:fox AND (title:"foo bar" AND body:"quick fox") OR title:fox'
    thread_lexer = lex.lexer.clone()
    parser.parse(qs1, lexer=thread_lexer)
    parser.parse(qs2, lexer=thread_lexer)

# The larger the range, the more likely it is
for i in range(100):
    _thread.start_new_thread(run, ())
alexgarel commented 1 year ago

@guozizi I added a helper function for that, as I think this might be a quite common scenario. See my request.

alexgarel commented 1 year ago

@mmoriniere I'm not sure you are not running into this in your server.

mmoriniere commented 1 year ago

@alexgarel I merged your PR about the thread-safe parse function, so I'm closing this issue.

guozizi commented 1 year ago

@guozizi I added a helper function for that, as I think this might be a quite common scenario. See my request. great, thanks @alexgarel