goru001 / inltk

Natural Language Toolkit for Indic Languages aims to provide out of the box support for various NLP tasks that an application developer might need
https://inltk.readthedocs.io
MIT License
821 stars 164 forks source link

identify languages doesn't work with Telugu in v0.9 #57

Open goru001 opened 4 years ago

goru001 commented 4 years ago

identify languages function which uses separate model for identifying the languages hasn't been retrained on Telugu in v0.9. Need to retrain it to support Telugu.

goru001 commented 4 years ago

@Shubhamjain27 Will you be able to take this up?

chaitusvk commented 4 years ago

Please help me i will train Telugu model .. I can see Language model file in NLP for Telugu ...where is seperate model located I am Telugu Speaking Guy..

lordzuko commented 3 years ago

@goru001 If someone isn't working on this, I can take this up. We can use pycld2, pycld3 , it identifies all the supported language except: oriya, bengali and sanskrit.

I have used the same in my own projects and it's also used by polyglot's language detection. https://github.com/aboSamoor/polyglot/blob/d0d2aa8/polyglot/detect/base.py#L72

What do you think ?

goru001 commented 3 years ago

@lordzuko That'll be great! Feel free to raise a PR for this.

nitkannen commented 3 years ago

@goru001 can I take this issue up if it is still unresolved?

goru001 commented 3 years ago

@nitkannen Yes sure, this is still unresolved and it'll be great if you can contribute!

nitkannen commented 3 years ago

Sure @goru001

nitkannen commented 3 years ago

@goru001 can you give me some guidance as to from where I can start to retrain the Telugu model. Any notebooks or scripts used for other languages and data can be really helpful