pemistahl / lingua-py

The most accurate natural language detection library for Python, suitable for short text and mixed-language text
Apache License 2.0
1.02k stars 43 forks source link

support for romanized Indian languages? #182

Closed tfriedel closed 8 months ago

tfriedel commented 8 months ago

Hi Peter! Have you tried your library with romanized versions of Indian languages like Hindi? There's a new library that's better at this than for example fasttext: https://github.com/AI4Bharat/IndicLID

I've noticed a lot Indian text for example in chat apps is romanized and usually not supported well by current language identification libraries.