Closed Tera2Space closed 6 months ago
Not completely ready yet. I feel like that training will not work in single stage. It has to be done in 2 stages like naturalspeech2. Train a model to get latents and then use those latents as target for pflow. Thats why I made an encodec based branch which has pretrained latents.
Thanks for response, got it :)
I'm trying to train e2e branch but the result is only noise in the audio, am I doing something wrong or is this version not ready yet? Basically hifigan out is just -1 tensor, so maybe i made mistake somewhere.