Closed rumbleFTW closed 11 months ago
See #28 The quality will be much worse because XPhoneBERT uses charsiug2p and is trained solely on phonemes unlike PL-BERT, so you should keep it in mind. I would suggest you wait for #41 instead.
Alright, thanks for the clarification @yl4579 !
Actually I wanted StyleTTS2 to work in my local language. I thought it was a drop-in replacement from the readme :sweat_smile: What do you think would be my best bet? Should I train PL-BERT from scratch using my own data? If yes how much data would and training time would be sufficient for yielding good results?
Thanks!
I think you could either not use pre-trained PL-BERT and initialize a BERT model from scratch (like https://github.com/yl4579/StyleTTS2/issues/139#issuecomment-1849280509) or to maximize the quality you can also train your own PL-BERT. I think you can just use wikipedia as the training corpus for your language. I am currently collecting data and in the process of training multilingual PL-BERT, see #41
Since it is mentioned that we can use XPhoneBert instead of the provided PL-BERT checkpoints for better multi-lingual inference, could you shed some light on how to load the XphoneBert checkpoints to infer using that? Thanks!