Poor vocoder outcome - Githubissues

Hello, I am fairly new to this topic. I have two problems that I cannot find any solution for. I read the documentation, scrolled all similar issues reported here: https://github.com/CorentinJ/Real-Time-Voice-Cloning/issues but didn't find any help there.

Short description: Encoder was trained fine, synthesizer as well. The only problem is my vocoder. Training of the vocoder in very slow and generates unsuited mel spectrograms in the toolbox (but tested wav files are fine). Instead of human speech, the toolbox generates almost noise itself.

Please take a look at the files: https://drive.google.com/drive/folders/1-SKYHRP8zy7vETqtMMJpKv1n7XKidBZL?usp=sharing

Longer description:

I trained all the 3 parts, encoder, synthesizer and a vocoder, but the last one is quite problematic. I trained them all from scratch, having 244 unique Polish speakers. I used (and adjusted to Polish language) the code uploaded on Github by @padmalcom. It looks like my vocoder is trained properly (this opinion is based on the wav filed generated by the vocoder). Somehow, when I open them in the demo_toolbox.py, the predicted mel spectrogram it's not even enar the target one. Is there any chance you might know what could cause the problem?
Till this moment, vocoder did only 14k iterations which might be the issue. This part is going really slow. Should it be like that? It's been 2 days of my PC working non-stop, and achieved only 14k iterations. I have NVIDIA GeForce RTX 3060 Ti and have installed latest releas of CUDA.

Any idea what could have gone wrong? I would be grateful for any suggestions :)

padmalcom / Real-Time-Voice-Cloning-German

Poor vocoder outcome #18