Closed Jourdelune closed 1 year ago
Pure statistical approaches to language detection are never 100% correct. The letter sequence in the word 'hello' is very common in Spanish, so the algorithm thinks it's Spanish as the probability for Spanish is greater than the probability for English.
Feed longer strings into the detector. Then you will get more reliable results.
Hello, I need to detect language in user generated content, it's for a chat. I have tested this library but the library have strange result in short text, for exemple the word hello:
return spanich (but the correct language is English)
Do you know some tips to have better result for detecting language on user generated content?