Hey, ive came across your repo and it is extremely helpful and impressive. Ive run an experiment with about 30 minutes of train data for source and target speakers which resulted pretty quicky very impressive impression of the target but the speech was quite distorted (like voice sound through a fan) and the g loss got stuck around 0.6 while the d loss around 0.4 (Ive used the default parameters for training) have you tried any harmonic vocoder such as WORLD with the same architecture? In cycle gan with WORLD I manage to get very clear generated speech that is far from the target. Do you have any recommendations for me for how to tackle this issue ?
Thank you !
Hey, ive came across your repo and it is extremely helpful and impressive. Ive run an experiment with about 30 minutes of train data for source and target speakers which resulted pretty quicky very impressive impression of the target but the speech was quite distorted (like voice sound through a fan) and the g loss got stuck around 0.6 while the d loss around 0.4 (Ive used the default parameters for training) have you tried any harmonic vocoder such as WORLD with the same architecture? In cycle gan with WORLD I manage to get very clear generated speech that is far from the target. Do you have any recommendations for me for how to tackle this issue ? Thank you !