Inference with multilingual PL-BERT Model

deguodedongxi commented 4 months ago

Hello everyone,

I tried to exchange the pretrained englich BERT model with the multilingual PL-BERT Model to generate speech with the LibriTTS Notebook. For me, the results did not really work out as I expected.

What I did:

In the Utils.PLBERT folder, I exchanged all the files (config.yml, step_1100000.t7, util.py)
In the LibriTTS inference notebook I changed the phonemizer language:

global_phonemizer = phonemizer.backend.EspeakBackend(language='de', preserve_punctuation=True, with_stress=True) or global_phonemizer = phonemizer.backend.EspeakBackend(language='fr-fr', preserve_punctuation=True, with_stress=True)

I added a reference voice audio in the specified language (even I am not really sure if it is needed).

The result still sounds as if the model tries to pronounce the german or french sentence with an english pronounciation.

Did I forget a step to use the BERT Model correctly? Thanks in advance!

Karesto commented 4 months ago

Yes, you need a multilingual version of style to do that. The issue here is that the PL Bert is just an embedding model for the phonemes, as the model itself is only trained on english speech, you'll need to retrain it on the language you want.

deguodedongxi commented 4 months ago

Do you know, if there are any pretrained style models for that? Pretraining a style on a multi lingual corpus from scratch will require a lot of resources, I guess.

Karesto commented 4 months ago

As far as i know, there are no multilingual pretrained models, however there are a few models here and there, you have ShoukanLabs that has an english model trained for expressivity https://dagshub.com/ShoukanLabs/Vokan and are currently working on a multilingual one.

deguodedongxi commented 4 months ago

Thank you so much! Dagshub's work looks really impressive! I will definitely follow up on their progress!

yl4579 / StyleTTS2

Inference with multilingual PL-BERT Model #240