mattico / elasticlunr-rs

A partial port of elasticlunr to Rust. Intended to be used for generating compatible search indices.
Apache License 2.0
52 stars 23 forks source link

Add support for more languages #7

Closed mattico closed 6 years ago

mattico commented 6 years ago

Each language needs a few components:

Right now we only have those for English. The stemmer and trimmer used to make the index should exactly match those used to search or there will be missing search results. elasticlunr.js uses https://github.com/weixsong/lunr-languages to support non-english languages. Since they used a JS version of snowball to generate their stemmers we can hopefully use bindings to the C version to make this easier. If not, at least the stemmers are made using simple functions unlike the English one which is a mess of regexes.

We'd also want an API for choosing a language. We could have pre-built pipelines for different languages. Also see how multi-language is done.

Keats commented 6 years ago

In terms of requirements I think having those 2 things is necessary:

The lunr.multilanguage approach sounds fine, I'm just wondering in the Gutenberg case whether when you are on the french version of the site you would want to see english results for example so being able to select a language on https://docs.rs/elasticlunr-rs/1.0.0/elasticlunr/struct.Index.html#method.new would be necessary imo so it would become:

let mut index = Index::new(&["title", "body"], Languages::French);
index.add_doc("1", &["C'est un titre", "C'est le contenu!"]);