lucidrains / e2-tts-pytorch

Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch
MIT License
228 stars 21 forks source link

Need for duration predictor? #6

Closed Akki737 closed 2 months ago

Akki737 commented 2 months ago

This is going to be a very silly question, but: I was just reading the E2 TTS paper, and I thought a key highlight was doing away with the need for a Duration Predictor model. Why then do we still have that in this implementation?

lucidrains commented 2 months ago

@Akki737 i'm not sure how true this is given this line in the paper we also require the target duration of the speech that we want to generate, which may be determined arbitrarily

lucidrains commented 2 months ago

@Akki737 but yea, i could remove the duration module and just have people pass in a target_duration

lucidrains commented 2 months ago

@Akki737 oh i already do on line 548 duration

Akki737 commented 2 months ago

@Akki737 oh i already do on line 548 duration

Aah yes you do! Thanks.

And it's line 558* for those folks lazier than me ;)