greyblake / whatlang-rs

Natural language detection library for Rust. Try demo online: https://whatlang.org/
https://whatlang.org/
MIT License
969 stars 109 forks source link

Improve quality of japanese detection with Mandarin chars #89

Closed greyblake closed 3 years ago

greyblake commented 3 years ago

Fixes https://github.com/greyblake/whatlang-rs/issues/88

Before

LANG AVG <= 20 21-50 51-100 > 100
Japanese 54.05% 52.94% 55.77% 55.55% 51.95%
Mandarin 96.43% 97.77% 97.19% 95.96% 94.80%
AVG 75.24% 75.36% 76.48% 75.75% 73.38%

Now

LANG AVG <= 20 21-50 51-100 > 100
Japanese 94.35% 93.71% 96.04% 94.41% 93.23%
Mandarin 96.12% 97.54% 96.98% 95.50% 94.45%
AVG 95.23% 95.62% 96.51% 94.95% 93.84%