lucidrains / e2-tts-pytorch

Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch
MIT License
228 stars 21 forks source link

Tweaks to conditioning to line up with the paper #18

Closed lucasnewman closed 1 month ago

lucasnewman commented 1 month ago

1) Don't mask the noised speech (aka flow step) on input 2) Provide the masked speech conditioning separately 3) Embed the noised speech, masked speech conditioning, and text separately and combine them

Screenshot 2024-07-27 at 12 01 13 PM

To be honest this doesn't change the training dynamics of the network very much, but it seems like it might be useful to align the implementation with the paper.