yl4579 / StarGANv2-VC

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
MIT License
466 stars 110 forks source link

Some doubt about the optimize step of style encoder #14

Closed 980202006 closed 2 years ago

980202006 commented 2 years ago

Hello, I have a little doubt about the optimization steps of style encoder. Whether self.optimizer.step('style_encoder', scaler=scaler) works, because input z_ref will not call style encoder. image

yl4579 commented 2 years ago

These are the code copied directly from the original stargan-v2 repo, so it is not a mistake and it will learn the styles as intended.

The idea here is that when training the mapping network, the style encoder is only made to learn the style that converts (reconstructs) the converted result back to the original domain, regardless of what domain it is converted to. When training the generator with the style encoder, the style encoder is fixed and it is used as a guide to diversifying the generated samples so that two different styles will produce different results. If you update the style encoder also in line 219, it will cause the style diversification loss to diverge because the style encoder can do anything to make the loss as large as possible.

980202006 commented 2 years ago

Thank you

YoniLeibner commented 1 year ago

Hii, I also noticed this case and I was wondering on the reason we need the ds loss (we want that s1 and s2 to result with different styles? so that the mel_ref will effect the result? if so did you tried to compute the loss also in line 219 just with lamda_ds=0?