greyblake / whatlang-rs

Natural language detection library for Rust. Try demo online: https://whatlang.org/
https://whatlang.org/
MIT License
969 stars 109 forks source link

Corner case with whitelist and Chinese + Japanese cognates #44

Closed purtato closed 5 years ago

purtato commented 5 years ago

The character '水' and many other characters are both valid Chinese and Japanese, however, when inputting these cognates with a whitelist, the whitelist is ignored

use whatlang::{Lang, detect_with_options};

fn main() {
    let opts = whatlang::Options::new()
        .set_whitelist(vec![Lang::Jpn].to_owned());

    let info = detect_with_options("水", &opts).unwrap();
    println!("Lang: {}", info.lang());
}

Output: Lang: 官话 despite only Japanese being in the whitelist

greyblake commented 5 years ago

Closed with https://github.com/greyblake/whatlang-rs/pull/45