shentianxiao / language-style-transfer

Apache License 2.0
553 stars 135 forks source link

What the loss should be when the model converge normally? #5

Open WeiyueSu opened 6 years ago

WeiyueSu commented 6 years ago

In your release model, the final loss is about 25 on training data, is that anything wrong? What the loss should be when the model converge normally?

shentianxiao commented 6 years ago

Hi,

Thanks for asking.

I changed the original GAN loss into the non-saturating loss (i.e. min log (1-D(G(z))) -> max log D(G(z)), see the GAN paper) so it becomes more stable. Now the adversarial loss is a positive term and has larger value.

I've run it multiple times, and it does converge to ~20 reconstruction loss + ~5 adversarial loss. It seems that when the reconstruction loss is very small it's hard to align the hidden states and the model behaves like a pure auto-encoder, not changing the source sentence after flipping the style. There're definitely learning and optimization problems here, but I haven't found a better way to this.