pemistahl / lingua

The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike
Apache License 2.0
706 stars 63 forks source link

error detection for 'ok, fine' #131

Closed wudideren closed 2 years ago

wudideren commented 2 years ago

'ok, fine' and 'Ok ok ok I ok' is detected as "TURKISH"

pemistahl commented 2 years ago

If you had read the documentation thoroughly, you would understand that there are not enough distinct ngrams in these phrases to calculate a reliable language estimate. Besides, the word ok is used in many languages so it is not a clear indicator for English.