lowerquality / gentle

gentle forced aligner
https://lowerquality.com/gentle/
MIT License
1.42k stars 295 forks source link

Adding to Gentle's Pronunciation Dictionary #76

Open natelawrence opened 8 years ago

natelawrence commented 8 years ago

I'm a big fan of Gentle but have repeatedly run into words which are OOV (which I take to mean Out Of Vocabulary).

After reading some other questions here on Github, I saw that Gentle uses ARPAnet phonemes (as does the CMU Pronunciation Dictionary).

I would greatly appreciate being able to add words and their corresponding ARPAnet phonemes to Gentle's pronunciation dictionary (even if it only applied to my local instance).

Looking through the source here on Github, I have not been able to locate where this is stored. If you could direct me to where the file is located, I would appreciate it.

Building an interface into Gentle where OOV words are listed and the user is presented a form to enter the corresponding ARPAbet phonemes for each word before rerunning alignment would also be desirable.

Any help you can provide me to this end would be welcome.

P.S. In the meantime, I have been attempting to use homophone phrases as stand-ins for OOV words in order to gain timing matches with essentially correct phonemes, but this is time-consuming when thinking of the best homophones and introduces many unwanted complications into maintaining my master transcript.

strob commented 8 years ago

Thanks for your comment! Kaldi's documentation has some tips on adding pronunciations to the vocabulary. I would love to integrate Phonetisaurus with Gentle to automate this process from the transcripts: any help towards this would be much appreciated.

natelawrence commented 4 years ago

Note to self: also see https://github.com/lowerquality/gentle/issues/158

natelawrence commented 4 years ago

Video tutorial from 2013 for adding words to the pronunciation dictionary for a different forced aligner: https://www.youtube.com/watch?v=P74skJMpY-0