Partially a query parsing issue, but likely to also be an indexing issue.
Especially in technical fields, or when doing digital humanities-style queries, there are a lot of valid queries which include meta characters. Not clear how to represent many of these in the Lucene query syntax, or to escape out to a simpler syntax. Also not clear how many of these can even be handled by the query engine. Some examples:
A* search in computer science ("A star" algorithm)
identifiers used in bio-medicine. could try to query by prefix, suffix, or sub-patterns. sometimes dashes, periods, spaces, or other characters have meaning
math. even simple things like searching for exponentiation. or symbols like β (\beta in LaTeX). appear in titles, abstracts, body, citations, etc. do we flatten these down (in a unicode-aware way) to, eg, "b" for indexing? expand "beta"? other isues: function syntax, arrows, primes, dots, set inclusion, real numbers ("R"), integers ("N"), dot product, etc.
Partially a query parsing issue, but likely to also be an indexing issue.
Especially in technical fields, or when doing digital humanities-style queries, there are a lot of valid queries which include meta characters. Not clear how to represent many of these in the Lucene query syntax, or to escape out to a simpler syntax. Also not clear how many of these can even be handled by the query engine. Some examples:
\beta
in LaTeX). appear in titles, abstracts, body, citations, etc. do we flatten these down (in a unicode-aware way) to, eg, "b" for indexing? expand "beta"? other isues: function syntax, arrows, primes, dots, set inclusion, real numbers ("R"), integers ("N"), dot product, etc.