Open jhkonan opened 2 years ago
Hi,
Thank you! This is indeed a bug in phoneme-to-word conversion. The model performs silence detection and alignment at the same time. The phones and words were first aligned to non-silent audio frames and then merged with the silent frames. This error occurred when silence was detected within a word and it caused a problem when merging non-silent and silent frames. This is indeed a problem and it does not seem to occur very frequently in my earlier tests. I will try to improve it when I have more time. Sorry for the bug.
Any updated?
Something seems to be not right with how SIL is used in the word transcriptions.
This is the first example in the LibriSpeech Test set.
Here is the true transcript:
Here is the forced aligned word transcript:
Here is the forced aligned phonetic transcript:
I suspect this may indicate a general problem with the phoneme to word conversion.