Closed yzmyyff closed 1 year ago
i'm currently going the encodec / voco route for starters
however, if you want to PR in the log model encoder / hifigan decoder, as in the paper, i can look into disentangling the dimensions earlier
Okay, I'll look into it
@yzmyyff oh i meant the encoding and decoding logic, like this . are you doing log mel <-> hifigan?
@yzmyyff actually your PR looks good! thank you!
@yzmyyff i'll just take care of the mel <-> hifigan encoder / decoder this week
The input of the audio model is 80-dim log mel in the paper. The model dimension is a hyperparameter that takes different values in different experiments. But in our impl these two values are merged to
Can they be separated?