How to solve OOV issue when using MFA?

liusongxiang commented 4 years ago

Hi, Thank you for providing the MFA labels for LibriTTS dataset. I have a not-so-relevant question: when I use MFA to force-align the LibriTTS dataset on my own, I found there are much more oov utterances than the amount in the missing_files.txt in this repo. I use the standard librispeech-lexicon.txt as pronouciation lexicon. I tried to add pronouciations of the oov words using the G2p tool, but, unluckily, the oov utterance amount is not reduced. Could you please inform how to tackle this problem? Thanks in advance.

kan-bayashi commented 4 years ago

I use the default model provided by MFA. missing_files.txt is the number of alignment failures, so the number of OOV is much larger (Maybe the same as you). To extend the vocab, it is better to ask MFA team, not me.

liusongxiang commented 4 years ago

Thank you very much!

kan-bayashi / LibriTTSLabel

How to solve OOV issue when using MFA? #1