mush42 / optispeech

A lightweight end-to-end text-to-speech model
MIT License
87 stars 10 forks source link

Train on multiple languages #9

Closed thewh1teagle closed 1 month ago

thewh1teagle commented 1 month ago

I just reached 1 million steps in training my Hebrew TTS model, and it sounds pretty good. However, I noticed it struggles to pronounce English. After investigating, I found that Hebrew in espeak-ng uses only 30-40 IPA phonemes, whereas English relies on IPA phonemes that is not in the dataset. How can I add basic English support, as English words sometimes appear within Hebrew text (as in many languages)?

mush42 commented 1 month ago

@thewh1teagle you can always replace the OOD phonemes with their closest equivalent before you feed them to the model. If you want native support for English it is a bit complicated as it requires that your dataset contains some English utterances. Otherwise you can use a voice conversion model to create synthetic English utterances and fine-tune the model on them. OptiSpeech contains support for multiple-languages using language embedding but I didn't test that yet.