Closed GrayJack closed 5 years ago
Hi! This option is not available in the demo, but you can specify whitelist or blacklist. Please see the documentation: https://docs.rs/whatlang/0.7.1/whatlang/
Let's say, you know ahead, that the given text must only either russian or english. You can use a whitelist, like:
use whatlang::{Detector, Lang};
let whitelist = vec![Lang::Eng, Lang::Rus];
// You can also create detector using with_blacklist function
let detector = Detector::with_whitelist(whitelist);
let lang = detector.detect_lang("There is no reason not to learn Esperanto.");
assert_eq!(lang, Some(Lang::Eng));
If whitelisting/blacklisting does not help for your task, then this library is not a proper choice. You may need to try use something else, that is based on dictionary. Unfortunately I don't know any rust library for this.
Please let me know, if this solves the issue (so I can close it)
Hi, I'll do some tests this week and let you know, thanks!!
@GrayJack Hi! Any updates?
Please reopen if you find the issue still relevant.
I'm working in a research that requires me to detect the language of articles based solely on the title, which there cases that the title have 3 to 7 words.
In the live demo I noticed that English and German require more words to have a good confidence than for Portuguese and Spanish.
I used random articles titles to test it.
There a way to optimize it, like a configuration in
Options
to use user specified n-grams of some kind? If not, there is another lib that you're aware of that maybe can satisfy my needs?