kerighan / eldar

Boolean text search in Python
MIT License
44 stars 9 forks source link

local variable 'right_part' referenced before assignment #16

Closed tomasohara closed 2 years ago

tomasohara commented 2 years ago

I ran into an UnboundLocalError when processing a complex boolean expression:

((navy 0416) OR "acoustic intelligence specialist" OR (navy 0416) OR "radar and sonar technicians")

Here's a simple way to reproduce the error;

In [12]: eldar.Query("(a OR b OR c").filter(['a', 'b', 'Z'])
---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
<ipython-input-12-0944f4380b98> in <module>
----> 1 eldar.Query("(a OR b OR c").filter(['a', 'b', 'Z'])

/usr/local/misc/programs/anaconda3/envs/osr-py3-8/lib/python3.8/site-packages/eldar/query.py in __init__(self, query, ignore_case, ignore_accent, match_word)
     17         self.ignore_accent = ignore_accent
     18         self.match_word = match_word
---> 19         self.query = parse_query(query, ignore_case, ignore_accent)
     20 
     21     def preprocess(self, doc):

/usr/local/misc/programs/anaconda3/envs/osr-py3-8/lib/python3.8/site-packages/eldar/query.py in parse_query(query, ignore_case, ignore_accent)
     84         if operator == "or":
     85             return OR(
---> 86                 parse_query(left_part, ignore_case, ignore_accent),
     87                 parse_query(right_part, ignore_case, ignore_accent)
     88             )

/usr/local/misc/programs/anaconda3/envs/osr-py3-8/lib/python3.8/site-packages/eldar/query.py in parse_query(query, ignore_case, ignore_accent)
     85             return OR(
     86                 parse_query(left_part, ignore_case, ignore_accent),
---> 87                 parse_query(right_part, ignore_case, ignore_accent)
     88             )
     89         elif operator == "and":

UnboundLocalError: local variable 'right_part' referenced before assignment

As a quick workaround, I apply the following fix:

¢ diff --context=1 query.py.original query.py
*** query.py.original   2022-05-22 14:50:45.629882709 -0500
--- query.py    2022-05-22 18:10:50.902445658 -0500
***************
*** 72,73 ****
--- 72,75 ----
      if match_len != 0:
+         # TPO hack: make sure both parts defined
+         left_part = right_part = ""
          # stop at first balanced operation
kerighan commented 2 years ago

the (navy 0416) part is not correct. Use "navy 0416" with quotes or ("navy" OR "0416") to fix it

tomasohara commented 2 years ago

Two points to consider regarding this issue:

  1. If it returned an invalid-query exception, then this would not be a problem. That would allow for more graceful handling of the simpler query that triggers the same issue (i.e., "(a OR b OR c").

  2. The simple fix doesn't change the way normal queries are handled.

tomasohara commented 2 years ago

Won't Fix would be a better resolution.