kerighan / eldar

Boolean text search in Python
MIT License
44 stars 8 forks source link

string index out of range when parsing query #17

Closed tomasohara closed 2 years ago

tomasohara commented 2 years ago

In the course of processing complex queries such as the following, I ran into IndexError's with parse_date:

((navy 0416) OR "acoustic intelligence specialist" OR (navy 0416) OR "radar and sonar technicians")

This can be reproduced as follows:

In [27]: eldar.Query('').filter(['a'])
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-27-7d7708cdc373> in <module>
----> 1 eldar.Query('').filter(['a'])

/usr/local/misc/programs/anaconda3/envs/osr-py3-8/lib/python3.8/site-packages/eldar/query.py in __init__(self, query, ignore_case, ignore_accent, match_word)
     17         self.ignore_accent = ignore_accent
     18         self.match_word = match_word
---> 19         self.query = parse_query(query, ignore_case, ignore_accent)
     20 
     21     def preprocess(self, doc):

/usr/local/misc/programs/anaconda3/envs/osr-py3-8/lib/python3.8/site-packages/eldar/query.py in parse_query(query, ignore_case, ignore_accent)
     49 def parse_query(query, ignore_case=True, ignore_accent=True):
     50     # remove brackets around query
---> 51     if query[0] == '(' and query[-1] == ')':
     52         query = strip_brackets(query)
     53     # if there are quotes around query, make an entry

IndexError: string index out of range

This can be fixed as follows:

¢ diff --context=1 entry.py.original entry.py
*** entry.py.original   2022-05-25 19:17:43.032805538 -0500
--- entry.py    2022-05-25 20:45:29.844490042 -0500
***************
*** 47,48 ****
--- 47,50 ----
  def strip_quotes(query):
+     if not query:
+         return query
      if query[0] == '"' and query[-1] == '"':
tomasohara commented 2 years ago

As with #16, this was noticed when processing complex queries; otherwise, the errors probably would have been reported sooner.

kerighan commented 2 years ago

this is not an issue with Eldar, your query is malformed. "navy 0416" without quotes does not mean anything.

tomasohara commented 2 years ago

An invalid query should not lead to IndexError exceptions. IN addition, making strip_query more robust seems beneficial and won't affect normal queries.

tomasohara commented 2 years ago

Won't Fix would be a better resolution.