lowerquality / gentle

gentle forced aligner
https://lowerquality.com/gentle/
MIT License
1.45k stars 295 forks source link

Bug: Cannot recognize the pronunciation of numbers #268

Open zxl777 opened 4 years ago

zxl777 commented 4 years ago

image I was surprised that these numbers were not recognized. What can I do to avoid this problem?

Thanks.

natelawrence commented 4 years ago

I'm only your fellow user but this is a long-standing issue.

You do have to spell numbers out with letters in order for Gentle to match what is heard. That is certainly an issue when you want numerals in your final output, instead of spelled-out numbers.

Part of the problem is that many numbers have multiple ways of being pronounced. Some people will read 2005 as "two-thousand-five", "two-thousand-and-five", "twenty-oh-five", "twenty-hundred-and-five", etc. If you think of the number of possible numbers and the number of possible pronunciations, this becomes kind of a horrible problem.

One of the two original authors of Gentle expressed interest four years ago in integrating Phonetisaurus into Gentle (to try to sound-out unknown words) but it has never been done.

For this reason (as well as out-of-vocabulary words) I asked my friend to code up a little batch search-and-replace webpage.

It requires as input:

I author my list of words to replace and their replacements in two columns in Excel, hence when you copy them out of Excel to paste into the webpage, the two columns are tab-separated.

If you have a transcript which contains "2", "20", "200", and "2005", please note that you would want to list those in the opposite order so that the search and replace function does not change the first two characters of "2005" to "twenty" when "two-thousand-five" is what is actually heard in your audio.

I realize that my use-case of feeding a transcript through Gentle in order to more accurately detect errors in transcription multiple times with more corrections each time might be a little out of the ordinary, but perhaps you'll find my batch search-and-replace method useful for maintaining a list of things to search and replace in future Gentle alignments.

zxl777 commented 4 years ago

Thank you for your answers, at least I have a temporary solution. The following is the result of identification with https://otter.ai/, the numbers are correct. The pronunciation of the numbers is estimated to be inferred from the context.

Yes. 25 years ago, before I was Dad, I had this whole other life. It was way back in 2005. I was 27 just starting to make it as an architect and living in New York with Marshall, my best friend from college. My life was good. And then uncle Marshall went and screwed the whole thing up.

It may even be just a guess. After all, no matter how you read it, you can always recognize that it is a number, and the position of the script is only a number, so you can align the timeline.

natelawrence commented 4 years ago

Thanks for reminding me of otter.ai.

For what it's worth, uploading recordings to YouTube (as Unlisted, if you don't care to get comments on the recordings) also yields decent results via their automatic captions.