Closed liusongxiang closed 4 years ago
I use the default model provided by MFA.
missing_files.txt
is the number of alignment failures, so the number of OOV is much larger (Maybe the same as you).
To extend the vocab, it is better to ask MFA team, not me.
Thank you very much!
Hi, Thank you for providing the MFA labels for LibriTTS dataset. I have a not-so-relevant question: when I use MFA to force-align the LibriTTS dataset on my own, I found there are much more oov utterances than the amount in the missing_files.txt in this repo. I use the standard librispeech-lexicon.txt as pronouciation lexicon. I tried to add pronouciations of the oov words using the G2p tool, but, unluckily, the oov utterance amount is not reduced. Could you please inform how to tackle this problem? Thanks in advance.