Open tleyden opened 7 years ago
First of all, wildcard search is not supported yet. Now to the details:
What your show where are the episodes. It matched on the title of the show. If we return that you'll get:
find
{name: ~= "geo*"}
return
.name
Which returns
"Nat Geo Wild Kids"
"Geo Bee"
What happened with geo*
is that it got stemmed to geo
and hence matches those seen above.
First of. Great project. I am thinking about using it as a backend for my mainly text-based research. So I am also interested in the issue.
Are wildcards or regex on the roadmap? Perhaps you could also shortly elaborate on the following: Which stemmer is used? (and for which language) Best way to proceed when trying to glob or regex?
thx
@OSHistory Wildcards are on the roadmap, but sadly there's a huge lack of time, hence I don't know when this will happen.
The stemmer currently used is just a Rust wrapper around Snowball. We don't do any language specific things yet, so you get whatever Snowball does.
Adding wildcard/regex is non-trivial. Perhaps @Damienkatz could give a brief overview on what he had in mind in regards to that.
Thanks for the reply. I can imagine that regex implementation is a huge task to implement. I think i would be happy if the snowball-stemmer would support something else than english. And indeed in stem.rs one can simply change the language.
It compiles fine, however, due to no rust experience I am a little bit lost on how to include it in my local npm installation to test on my sample data which is in german. Would be gratefull on hints as how to do it or where to start.
Perhaps an option to specify a language for the stemmer on index creation might substantially increase flexibility for non-english use cases? Something along the lines of:
let index = noise.open("myindex", true, { "lang": "german" });
Most use cases should operate on a single language.
@OSHistory Could you please open another issue for supporting other languages as an option? This way it won't get lost that easily.
@vmx sure i was thinking the same thing while writing...
Disclaimer: I didn't read the documentation :-)
I searched for:
and got results:
Was expecting only results with "Geo*" in the name, like "George".