Open WeiyueSu opened 6 years ago
Hi,
Thanks for asking.
I changed the original GAN loss into the non-saturating loss (i.e. min log (1-D(G(z))) -> max log D(G(z)), see the GAN paper) so it becomes more stable. Now the adversarial loss is a positive term and has larger value.
I've run it multiple times, and it does converge to ~20 reconstruction loss + ~5 adversarial loss. It seems that when the reconstruction loss is very small it's hard to align the hidden states and the model behaves like a pure auto-encoder, not changing the source sentence after flipping the style. There're definitely learning and optimization problems here, but I haven't found a better way to this.
In your release model, the final loss is about 25 on training data, is that anything wrong? What the loss should be when the model converge normally?