yl4579 / StarGANv2-VC

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
MIT License
479 stars 107 forks source link

Some doubt about the loss #11

Closed 980202006 closed 3 years ago

980202006 commented 3 years ago

Hi, I have two questions about the loss.

  1. For the loss of the classification discriminator, why it only classifies the fake samples, but not the real samples at the same time, so that it can better learn the tone characteristics of the speaker.
  2. I tried to regenerate the generated mel spectrum through the style encoder to generate a vector, and do L1loss with the vector of the target tone, but this did not achieve better results.
yl4579 commented 3 years ago
  1. You can classify the real samples too, but the real samples are not converted by the generator, so it will not help the generator to fix its mistakes.
  2. I've tried this too but it gave me even worse results because reconstruction loss forces the generator to not convert the input characteristics, so it will make the similarity even lower.
980202006 commented 3 years ago

Thank you!