After the back-propagation of loss G, it runs zero_grad to clear all the grads of the generator in the auto-encoder. However, the encoder is also in the forward path, and its gradient preserves. Then it computes the encoder loss and back-propagate again. So the gradient of VAE loss is accumulated twice and the final value is doubled. Is my understanding correct here?
I notice there are two backpropagations for the generator and encoder.
https://github.com/wiseodd/controlled-text-generation/blob/master/train_discriminator.py#L120-L122 https://github.com/wiseodd/controlled-text-generation/blob/master/train_discriminator.py#L130-L132
After the back-propagation of loss G, it runs zero_grad to clear all the grads of the generator in the auto-encoder. However, the encoder is also in the forward path, and its gradient preserves. Then it computes the encoder loss and back-propagate again. So the gradient of VAE loss is accumulated twice and the final value is doubled. Is my understanding correct here?