yl4579 / PL-BERT

Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions
MIT License
216 stars 39 forks source link

Use with other TTS? #8

Closed godspirit00 closed 1 year ago

godspirit00 commented 1 year ago

Hi @yl4579 ! Thanks for the great work! I'm working on a TTS application based on VITS, and I'd like to improve the natualness of speech, and I came across here. You mentioned here that you had tested PL-BERT with VITS. Could you please explain a bit more about how to use it with VITS (and maybe Fastspeech 2 also)? Thank you.

yl4579 commented 1 year ago

For VITS, you can just replace the embedding in text encoder with the output of PL-BERT, and nothing else needs to be changed. For FastSpeech 2, you can replace the encoder with PL-BERT with the same hidden channels (768), or you can append a linear projection same as used in StyleTTS.