Open JeffreyBenjaminBrown opened 7 years ago
Probably not possible using Lucene syntax. In regular expressions, that would be a character set like [^A-Za-z], but I don't think Lucene supports regex. We could add support at the filter level if the use cases are compelling enough to justify breaking from Lucene.
"support at the filter level"? Does that mean rewriting Lucene?
No, it means defining a new, SmSn-specific query syntax, and mapping expressions in that syntax to Lucene syntax (then filtering on the results). Since Lucene syntax is so well-known, the pros (more expressive queries) would have to be pretty significant to outweigh the cons (a syntax everyone has to learn from scratch, and an implementation we have to develop, test, and maintain).
I'm not (yet) suggesting any radical changes to Lucene, just a symbol for non-alphanumeric characters.
But while we're on the subject ...
I have nodes that look like this: practical & alone. {easy money}
. The trailing period indicates that "practical & alone" is a complete thought. Then the bracketed expression provides an example of the content of the category. It would be valuable to me if I could search in only the first sentence, or search on the whole note but then rank according to the length of the first sentence rather than the whole note.
You suggested you want more use cases. Consider (I did this reacently) searching for notes about yourself, by searching for (probably among other words) the word "me". If you want to find it when it's adjacent to punctuation, you'll have to surround it with *
symbols. But to do that is to find every word with me
in it, which is a humongous number of words.
If I'm searching for
sum
, I'll probably search for*sum*
. Otherwise I'll miss(sum
orsum.
or other punctuation-adjoined instances of the word. However, by using*
I expand the search to includesumo
andassume
and other things I'm not looking for.I googled for a while and still don't know whether it's possible.