yl4579 / PL-BERT

Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions
MIT License
216 stars 39 forks source link

Proper format for dataset #38

Closed rajdeep1337 closed 10 months ago

rajdeep1337 commented 10 months ago

I want to train my own PL-bert model, but am unsure in what format the dataset needs to be. Could you please shed some lights on this? Thanks!

yl4579 commented 10 months ago

Sorry for the late reply. I was quite busy recently. The format is basically the pair (phonemes, grapheme token). The goal is to predict the masked phonemes and the corresponding grapheme token for each phoneme. You can refer to this multilingual PL-BERT dataset as an example: https://huggingface.co/datasets/styletts2-community/multilingual-pl-bert