inspirehep / inspire-next

The INSPIRE repo.
https://inspirehep.net
GNU General Public License v3.0
59 stars 69 forks source link

Crash with particular query #572

Closed kaplun closed 8 years ago

kaplun commented 8 years ago

When searching for Inelastic strong interactions at high-energies: annual progress report for the periods june the system crashes with:

Traceback (most recent call last):
  File "/home/skaplun/.virtualenvs/labs/lib/python2.7/site-packages/flask/app.py", line 1836, in __call__
    return self.wsgi_app(environ, start_response)
  File "/home/skaplun/.virtualenvs/labs/lib/python2.7/site-packages/flask/app.py", line 1820, in wsgi_app
    response = self.make_response(self.handle_exception(e))
  File "/home/skaplun/.virtualenvs/labs/lib/python2.7/site-packages/flask_restful/__init__.py", line 270, in error_router
    return original_handler(e)
  File "/home/skaplun/.virtualenvs/labs/lib/python2.7/site-packages/flask/app.py", line 1403, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/home/skaplun/.virtualenvs/labs/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/skaplun/.virtualenvs/labs/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/skaplun/.virtualenvs/labs/lib/python2.7/site-packages/flask_restful/__init__.py", line 270, in error_router
    return original_handler(e)
  File "/home/skaplun/.virtualenvs/labs/lib/python2.7/site-packages/invenio_base/wrappers.py", line 132, in handle_user_exception
    return super(Flask, self).handle_user_exception(e)
  File "/home/skaplun/.virtualenvs/labs/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/skaplun/.virtualenvs/labs/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/skaplun/.virtualenvs/labs/lib/python2.7/site-packages/flask_debugtoolbar/__init__.py", line 125, in dispatch_request
    return view_func(**req.view_args)
  File "/home/skaplun/.virtualenvs/labs/lib/python2.7/site-packages/invenio_base/decorators.py", line 200, in decorator
    return f(*args, **argd)
  File "/home/skaplun/.virtualenvs/labs/lib/python2.7/site-packages/invenio_collections/decorators.py", line 73, in decorated
    return method(collection, *args, **kwargs)
  File "/home/skaplun/.virtualenvs/labs/lib/python2.7/site-packages/invenio_search/views/search.py", line 205, in search
    response = Query(p).search(collection=collection.name)
  File "/home/skaplun/.virtualenvs/labs/lib/python2.7/site-packages/invenio_search/api.py", line 62, in search
    query = self.query
  File "/home/skaplun/.virtualenvs/labs/lib/python2.7/site-packages/werkzeug/utils.py", line 73, in __get__
    value = self.func(obj)
  File "/home/skaplun/.virtualenvs/labs/lib/python2.7/site-packages/invenio_search/api.py", line 53, in query
    tree = pypeg2.parse(self._query, parser(), whitespace="")
  File "/home/skaplun/.virtualenvs/labs/lib/python2.7/site-packages/pypeg2/__init__.py", line 669, in parse
    raise parser.last_error
SyntaxError: expecting one of [<class 'invenio_query_parser.parser.NotQuery'>, <class 'invenio_query_parser.parser.AndQuery'>, <class 'invenio_query_parser.parser.OrQuery'>, <class 'invenio_query_parser.parser.ImplicitAndQuery'>] (line 1)
kaplun commented 8 years ago

@Panos512 can you have a look?

Panos512 commented 8 years ago

Looks like the problem is on the '-' character. The parser tries to find keywords automatically, identifying as a keyword every word that is followed by ':'. The problem is that is tries to match only alpharithmetical characters, so '-' breaks it.

We could try enabling special characters on keywords but I think this approach is wrong because we will have to identify queries without keywords to enable the "google like" search. e.g. in this particular query 'high-energies:' shouldn't be identified as a keyword but as a part of a value (the whole sentence).

I will continue investigating and come back for more.

Panos512 commented 8 years ago

After a three day fight with the parser he finally won. It seems that accepting keywords from a specific list and parsing every other word ending with the : character as a value is not supported by the parsers logic at the moment. What we can do (except of creating a new parser or radically redesigning this one) is, exactly what I didn't want to do, which is letting the parser accept special characters inside keywords.

E.g. high-energies: annual will be parsed as :

KeywordOp(Keyword('high-energies'), Value('annual')) 

and the whole sentence will be parsed as :

AndOp(AndOp(AndOp(AndOp(AndOp(AndOp(AndOp(AndOp(AndOp(AndOp(ValueQuery(Value('Inelastic')), ValueQuery(Value('strong'))), ValueQuery(Value('interactions'))), ValueQuery(Value('at'))), KeywordOp(Keyword('high-energies'), Value('annual'))), ValueQuery(Value('progress'))), ValueQuery(Value('report'))), ValueQuery(Value('for'))), ValueQuery(Value('the'))), ValueQuery(Value('periods'))), ValueQuery(Value('june')))

Regarding the "google like" search what we can do (as discussed irl with @kaplun ) is try to identify if the query contains any keywords before starting the parsing procedure. If it doesn't then we can pass it to a nice elasticsearch query that will do the magic. On any other result we keep the current behavior.

If anyone has a better, more clean, solution I would love the hear it.

kaplun commented 8 years ago

@Panos512 just to be fully sure I am understanding: what do you mean with the second example?

Panos512 commented 8 years ago

It's the parsers output for:

Inelastic strong interactions at high-energies: annual progress report for the periods june
kaplun commented 8 years ago

As mused IRL there is still one more tentative (too complex to explain on Github :dancer: ) to try to fix this bug.