Closed Akki737 closed 2 months ago
@Akki737 i'm not sure how true this is given this line in the paper we also require the target duration of the speech that we want to generate, which may be determined arbitrarily
@Akki737 but yea, i could remove the duration module and just have people pass in a target_duration
@Akki737 oh i already do on line 548 duration
@Akki737 oh i already do on line 548
duration
Aah yes you do! Thanks.
And it's line 558* for those folks lazier than me ;)
This is going to be a very silly question, but: I was just reading the E2 TTS paper, and I thought a key highlight was doing away with the need for a Duration Predictor model. Why then do we still have that in this implementation?