yl4579 / PL-BERT

Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions
MIT License
218 stars 40 forks source link

Requesting a quick heal with a few queries at hand #15

Closed ghost closed 1 year ago

ghost commented 1 year ago

Went through few of your answers in the issues, would like to know: 1) whether your suggested modifications for VITS and FastSeech 2 models apply only for inference or for finetuning too? 2) in the readme section you have mentioned some changes to be made in the train_second.py of styleTTS if one wishes to include PL-BERT. Does that mean train_first.py can be skilled at all or the stage one has to be done without PL-Bert. I am only interested in finetuning the pre-trained model 3) the first modification https://github.com/yl4579/StyleTTS/blob/main/models.py#L683, it points to the line where there is the discriminator is instantiated, which will an argument for Munch(), but the replacement code doesn't instantiate discriminator anywhere

yl4579 commented 1 year ago
  1. You have to train VITS and FastSpeech 2 (or any TTS models) from scratch if you want to use a different text encoder than the original one (including PL-BERT). So the answer to your question is neither, you need to train the TTS model from scratch.
  2. The first stage is independent of PL-BERT, as it only trains an acoustic model. The PL-BERT is only used for prosody and duration prediction. This does not work for VITS and FastSpeech 2 though, as both models are end-to-end that do not train an acoustic module first then train a predictor module like StyleTTS does.
  3. The original intention was to copy-paste the provided code snippet after the discriminator line. However, if you do not understand how to modify the code, you can refer to the zipped file with all the modified code: https://drive.google.com/file/d/18DU4JrW1rhySrIk-XSxZkXt2MuznxoM-/view
  4. If you are referring to https://nonint.com/static/tortoise_v2_examples.html, I believe StyleTTS is better, but it also depends on the dataset. If you are interested you can also refer to our latest work StyleTTS 2 here: https://styletts2.github.io/. The code will be made publicly available by the end of this month.