lowerquality / gentle

gentle forced aligner
https://lowerquality.com/gentle/
MIT License
1.43k stars 296 forks source link

Using Eesen as a base? #97

Open migueljette opened 8 years ago

migueljette commented 8 years ago

Hi there, Is there a plan to using Eesen as the speech recognizer instead of Kaldi? I would love to get rid of phonetics, and train a pure DNN models instead of the hybrid ones from Kaldi. By the way, the software you wrote here is fantastic! I can't wait to test it out thoroughly with my own models. Thanks!

strob commented 8 years ago

No plans currently. Eesen is a fascinating project, though I'm not sure how suitable it would be for an aligner, as the lack of phonetic modeling would seem to invalidate Gentle's approach of making a custom language-model based on the user-supplied transcript. Curious if you have ideas for alignment using Eesen as backend.

migueljette commented 8 years ago

hio @strob. I hadn't thought about it thoroughly, but there are ways to use eesen to align utterances. But the timing information wouldn't be as precise as with phonetic-based approaches. I have a discussion started with one of the authors of eesen here: https://github.com/srvk/eesen/issues/88 Could be an interesting project to work on.