Tweaks to conditioning to line up with the paper

lucidrains / e2-tts-pytorch

Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch

MIT License

228 stars 21 forks source link

Tweaks to conditioning to line up with the paper #18

Closed lucasnewman closed 1 month ago

lucasnewman commented 1 month ago

1) Don't mask the noised speech (aka flow step) on input 2) Provide the masked speech conditioning separately 3) Embed the noised speech, masked speech conditioning, and text separately and combine them

To be honest this doesn't change the training dynamics of the network very much, but it seems like it might be useful to align the implementation with the paper.