wiseodd / controlled-text-generation

Reproducing Hu, et. al., ICML 2017's "Toward Controlled Generation of Text"
BSD 3-Clause "New" or "Revised" License
242 stars 63 forks source link

Is the gradient for Encoder doubled? #27

Open Clement25 opened 3 years ago

Clement25 commented 3 years ago

I notice there are two backpropagations for the generator and encoder.

https://github.com/wiseodd/controlled-text-generation/blob/master/train_discriminator.py#L120-L122 https://github.com/wiseodd/controlled-text-generation/blob/master/train_discriminator.py#L130-L132

After the back-propagation of loss G, it runs zero_grad to clear all the grads of the generator in the auto-encoder. However, the encoder is also in the forward path, and its gradient preserves. Then it computes the encoder loss and back-propagate again. So the gradient of VAE loss is accumulated twice and the final value is doubled. Is my understanding correct here?

Sry2016 commented 3 years ago

Hello ! Have you solved this problem?

Sry2016 commented 3 years ago

You are right. So I think we should add """trainer_E.zero_grad()“”“ here but without trainer_E.step().