lucidrains / e2-tts-pytorch

Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch
MIT License
228 stars 21 forks source link

Generate mask spans of 70-100% by default during training #16

Closed lucasnewman closed 1 month ago

lucasnewman commented 1 month ago

I noticed the masks were pretty narrow when visualizing the inputs during training. The paper mentions that they use 70-100% masks for each sequence in the batch (like Voicebox), so this updates mask generation to handle that.

Screenshot 2024-07-24 at 10 54 07 AM

(I'm not super solid on the tensor typing so feel free to fix/edit if you want to take this!)

lucidrains commented 1 month ago

@lucasnewman looks great! thank you!