goru001 / inltk

Natural Language Toolkit for Indic Languages aims to provide out of the box support for various NLP tasks that an application developer might need
https://inltk.readthedocs.io
MIT License
824 stars 163 forks source link

Encoding Devanagri #68

Closed harsh244 closed 3 years ago

harsh244 commented 3 years ago

Hi,

I don't know if this is the right place for my question, but still posting since this repo is related to indic languages.

I am working on a Speech-to-text system for Maithili and since Maithili is nowadays written in Devnagiri( same as Hindi). I am wondering if this library or any other has the provision for Encoding Maithili sentences. I have gone through the docs and noticed that we can get sentence encodings and vector embeddings for Hindi, but I am not sure if we can use it for Maithili sentences. The objective is to get vector embeddings and feed it to our speech recognition model.

Again, I am pretty new to this so any help will be greatly appreciated.

goru001 commented 3 years ago

Thanks @harsh244 for reaching out. Maithili isn't yet supported in iNLTK, but as you said, it has some similarity with other Indic languages, so you can try out working with Hindi embeddings - my best guess at this point is it might not produce best results, but should still work out.

goru001 commented 3 years ago

Closing this issue for now. Feel free to re-open is there's anything else.