Closed kaplun closed 8 years ago
@Panos512 can you have a look?
Looks like the problem is on the '-' character. The parser tries to find keywords automatically, identifying as a keyword every word that is followed by ':'. The problem is that is tries to match only alpharithmetical characters, so '-' breaks it.
We could try enabling special characters on keywords but I think this approach is wrong because we will have to identify queries without keywords to enable the "google like" search. e.g. in this particular query 'high-energies:' shouldn't be identified as a keyword but as a part of a value (the whole sentence).
I will continue investigating and come back for more.
After a three day fight with the parser he finally won. It seems that accepting keywords from a specific list and parsing every other word ending with the :
character as a value is not supported by the parsers logic at the moment. What we can do (except of creating a new parser or radically redesigning this one) is, exactly what I didn't want to do, which is letting the parser accept special characters inside keywords.
E.g. high-energies: annual will be parsed as :
KeywordOp(Keyword('high-energies'), Value('annual'))
and the whole sentence will be parsed as :
AndOp(AndOp(AndOp(AndOp(AndOp(AndOp(AndOp(AndOp(AndOp(AndOp(ValueQuery(Value('Inelastic')), ValueQuery(Value('strong'))), ValueQuery(Value('interactions'))), ValueQuery(Value('at'))), KeywordOp(Keyword('high-energies'), Value('annual'))), ValueQuery(Value('progress'))), ValueQuery(Value('report'))), ValueQuery(Value('for'))), ValueQuery(Value('the'))), ValueQuery(Value('periods'))), ValueQuery(Value('june')))
Regarding the "google like" search what we can do (as discussed irl with @kaplun ) is try to identify if the query contains any keywords before starting the parsing procedure. If it doesn't then we can pass it to a nice elasticsearch query that will do the magic. On any other result we keep the current behavior.
If anyone has a better, more clean, solution I would love the hear it.
@Panos512 just to be fully sure I am understanding: what do you mean with the second example?
It's the parsers output for:
Inelastic strong interactions at high-energies: annual progress report for the periods june
As mused IRL there is still one more tentative (too complex to explain on Github :dancer: ) to try to fix this bug.
When searching for
Inelastic strong interactions at high-energies: annual progress report for the periods june
the system crashes with: