After read code on the vocoder part, I found that there is only a pre-trained model and no training steps. Why is there no implementation of this part ? And under what circumstances is the pre-trained model obtained and how is its performance ?
The vocoder part in the original TFGAN paper does not include the subband discriminator(there is also no implementation of this part). Because I did not see the relevant interpretation in the paper, what help or impact does the subband discriminator have on the model ?
If I can get an answer, it will help me a lot.
Thank you.
Hi @LqNoob, I'm not sure if you still need the answer or not. Many apologize for the late reply. These are good questions.
The implementation of TFGAN is confidential as the codebase of ByteDance, so I cannot open-source it. If you are interested you can refer to this repo, which has a similar implementation as ours. To achieve speaker-independent, you need to use at least 1000+ speakers in the training dataset.
We use a subband discriminator to enhance the discriminative power of the GAN. We believe this can help TFGAN achieves a better vocoding result.
Hi, @haoheliu. Thank you for your awesome work.
If I can get an answer, it will help me a lot. Thank you.