pemistahl / lingua-py

The most accurate natural language detection library for Python, suitable for short text and mixed-language text
Apache License 2.0
1.1k stars 44 forks source link

these strings are recognized in ITALIAN but they are in english #84

Closed andreabisello closed 1 year ago

andreabisello commented 1 year ago

these strings are recognized in italian, even they are in english (and there is any italian word in this)

pemistahl commented 1 year ago

Yes, this is certainly possible. The sum of the ngram probabilities for Italian will be larger than the sum of the ngram probabilities for English. This is not a bug in the library. The statistical approach is never 100% correct.