yl4579 / StyleTTS

Official Implementation of StyleTTS
MIT License
387 stars 62 forks source link

Probleam about data processing #32

Closed Zhongxu-Wang closed 1 year ago

Zhongxu-Wang commented 1 year ago

I found this line of code in the meldataset.py file and I was curious about what it does. Why does wav need to be extended in the code? wave = torch.cat([torch.zeros([5000]), wave, torch.zeros([5000])], axis=0)

yl4579 commented 1 year ago

This is to compensate for the start of the text and the end of the text (token index 0) for the text aligner. We append a silence token at the beginning and end of the text to make it align leading and ending silences, but some datasets like LibriTTS and LJSpeech do not have silences at the beginning and the end, so we add some silences at the beginning and the end of the sentence for robustness.