yl4579 / PL-BERT

Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions
MIT License
217 stars 40 forks source link

Implement details about mask strategy #1

Closed TinaChen95 closed 1 year ago

TinaChen95 commented 1 year ago

Hi, I'm curious about the MLM task's masking strategy in your paper (section 2.2.2)

"When a grapheme is selected, its phonemes tokens are replaced with a MASK token 80% of the time, are replaced with random phonemes token 10% of the time, and stay unchanged 10% of the time."

when you replace with random phonemes, will it be related to an real word? For example, will insane phoneme sequence such as 'abcde' be possible?

yl4579 commented 1 year ago

Yes, it is completely random phonemes, not a word, because eventually we want to predict the correct phonemes from random phonemes, and the number of phonemes for a word must match that of the target word.