I have trained a voice using your framework. I wanted to use it as a Vocoder for Grad-TTS.
Unfortunately the voice that is created as a result is way too high in its pitch.
Could you provide me with a hint or advice how this can happen?
Do I need to change some configs or can this happen in the inference? Do I need to pre-process the input wav?
Hello,
I have trained a voice using your framework. I wanted to use it as a Vocoder for Grad-TTS. Unfortunately the voice that is created as a result is way too high in its pitch.
Could you provide me with a hint or advice how this can happen? Do I need to change some configs or can this happen in the inference? Do I need to pre-process the input wav?