Closed godspirit00 closed 1 year ago
For VITS, you can just replace the embedding in text encoder with the output of PL-BERT, and nothing else needs to be changed. For FastSpeech 2, you can replace the encoder with PL-BERT with the same hidden channels (768), or you can append a linear projection same as used in StyleTTS.
Hi @yl4579 ! Thanks for the great work! I'm working on a TTS application based on VITS, and I'd like to improve the natualness of speech, and I came across here. You mentioned here that you had tested PL-BERT with VITS. Could you please explain a bit more about how to use it with VITS (and maybe Fastspeech 2 also)? Thank you.