What the loss should be when the model converge normally？

Hi,

Thanks for asking.

I changed the original GAN loss into the non-saturating loss (i.e. min log (1-D(G(z))) -> max log D(G(z)), see the GAN paper) so it becomes more stable. Now the adversarial loss is a positive term and has larger value.

I've run it multiple times, and it does converge to ~20 reconstruction loss + ~5 adversarial loss. It seems that when the reconstruction loss is very small it's hard to align the hidden states and the model behaves like a pure auto-encoder, not changing the source sentence after flipping the style. There're definitely learning and optimization problems here, but I haven't found a better way to this.

shentianxiao / language-style-transfer

What the loss should be when the model converge normally？ #5