lowerquality / gentle

gentle forced aligner
https://lowerquality.com/gentle/
MIT License
1.42k stars 295 forks source link

Does Gentle use alternate pronunciation phoneme sets where available? #79

Open natelawrence opened 8 years ago

natelawrence commented 8 years ago

I have I have noticed that whether a speaker says "probably" or "prob'ly", Gentle seems to always use the phoneme set for "prob'ly" (P R AA B L IY).

At first, I suspected that this was due to a defect/eccentricity in CMU's pronunciation dictionary or that perhaps it only listed one pronunciation of each word.

After looking a couple of copies of the dictionary, however, I see that: CMUSphinxDict lists both: PROBABLY P R AA B AH B L IY PROBABLY(2) P R AA B L IY

and CMUDict-0.7b lists only: PROBABLY P R AA1 B AH0 B L IY2

Could you shed some light on what is happening in cases such as these?

In future, it would be desirable (given that Gentle has correctly marked the beginning and the end of the word) to tell Gentle that specific instances in the transcript are actually using the other pronunciation and have the aligner re-examine them for new phoneme timing on the basis of that alternate phoneme set.

strob commented 8 years ago

Hi Nate! We're using a pre-compiled L.fst from Kaldi, and it has several problems (including a bizarre pronunciation of the letter "a"). Would love to move away from it, eventually. Forcing alternative phoneticizations sounds interesting: maybe there should be a way for Gentle to align phonemes without words.

natelawrence commented 8 years ago

Indeed,

I have, in weeks past, considered requesting being able to type phonemes directly into the transcript, (in the case of OOV words, primarily, but this case is also a good application).

I expect that we would need to enclose a string of phonemes with some form of markup/syntax/punctuation so that Gentle can tell when we're shifting between English and direct phoneme input (especially in the case of single letter phonemes versus expecting Kaldi/Gentle to catch someone actually pronouncing the name of a single letter).

strob commented 8 years ago

Yes, we would need some sort of explicit syntax for this. I would start by trying to force-align a transcript exclusively consisting of phonemes; it seems related to #76 (ie. we could make "fake" words based on user-supplied phonemes).

natelawrence commented 3 years ago

279