Some questions about the code implementation~Thank you.

EricKani commented 4 years ago

Hi haofu,

It is really a brilliant work in this paper. When I do some work base on you code. I found some problems in my opinion. I can not determine if they are right. So, I come to this issue to communicate with you. Sorry to bother you.

1 The first is line 83, "def _update(self, backprop=True):", in your "models/base.py". I think we should do "backprop_off(self)" after we optimize the network, and then "backprop_on(self)" via "def _clear(self, backprop=True)" in next iteration. But there seems not "backprop_off" the Discriminator when we train ADN.

2 question about the implemenation of cycle_loss. When train ADN network. You encode normal content information from Fake Artifact_image, and then decode it back to image without artifact. Cycle-loss get! But I am not sure will the Second forward will broke the artifact reconstruction loss backpropgation. Because the feature has changed in my opinion.

These are questions that bother me. Hope got your reply, and thank you very much~

Best regards,

pengbol

liaohaofu commented 4 years ago

Hi Pengbol,

Thank you for your interest in our work and good questions! For the first question, we add this option to save GPU memory in some cases (but not actually used in this code). When in no_backprop mode, the intermediate forward results will not be saved and thus saving memory (please see the official torch docs about how this really works). The optimization process itself will not be affected by this option. To be specific, the only case when this option matters to optimization is when the discriminator will be used after the _update() but it is not true here.

For your second question, no it will not break the backprop. In Pytorch, when a network is used twice in the computation (before backward()), the gradients will be computed separately and the total gradient for the network will be accumulated. This is exactly the behavior we want here.

Best, Haofu

EricKani commented 4 years ago

Hi haofu,

Thanks for your quick response~

For your second answer, can I understand it like this: when a network is used twice in one iteration, such as an encoder that extracts normal image code from an image with metal artifacts and a decoder that reconstructs normal images, both sets of forward intermediate features will be preserved? And when backward(), separate gradient will be computed via these two sets of loss and features separately? I used to think that the features from the second forward would overwrite the features generated from the first forward calculation. (抱歉，防止我的表述不准确，中文问一遍... 你英文回复，我不懂再问，非常感谢。我对你的答复的大概理解就是：当一个网络如果在一次iteration中被使用了两次，如从有金属伪影的图像中提取正常图像内容的编码器和重建正常图像的解码器，两次计算前向中间产生的features都会被保存，然后在backward()的时候，会分别用于两个loss回传的梯度计算，是这样吗？我过去以为第二次前向产生的features会覆盖掉第一次前向的features)

For your first answer, do you means if we don't use discriminator again after the _update(), backprop_on mode will not matters subsequent optimization. But discriminator also contribute a GAN_g_loss to ADN, when we backward(), won't there generated gridents for discriminator? (我理解的你意思是，如果判别器在update之后不再使用，对优化就不会产生影响。这个我大概理解，因为如果不再使用，不会产生新的loss，在回传计算梯度时就不会计算判别器的梯度。但是我们在使用判别器时，计算了两个loss。GAN_d_loss和GAN_g_loss，GAN_d_loss在我们update判别器的时候使用了，但是GAN_g_loss在优化ADN的时候应该也会计算判别器的梯度吧，因为按上边回答的理解，判别器会保存中间产生的features)

I don't know is my understanding right. Bother you again, because I think this is important for my understanding network optimization.

Best, pengbol

liaohaofu commented 4 years ago

Yes, the network will save all its intermediate results (i.e., features) separately each time it is used and no overwriting will incur when used multiple times. Try the following code and you will see the difference (note the .detach() here)

import torch
import torch.nn as nn

x = torch.rand(1, 3, 64, 64)
layer = nn.Conv2d(3, 3, 1)
c = nn.L1Loss()

# first try
layer.zero_grad()
y1 = layer(x)
y2 = layer(y1.detach())
loss = c(y2, x)
loss.backward()
print(layer.weight.grad)

# second try
layer.zero_grad()
y1 = layer(x)
y2 = layer(y1)
loss = c(y2, x)
loss.backward()
print(layer.weight.grad)

For the second part, no there won't be two gradient updates for the discriminator. Because we do not update the gradients of the discriminator after the _update() function. When computing GAN_d_loss and GAN_g_loss, we got two losses for the discriminator (you are right for this part), but we only update the discriminator using GAN_d_loss which only contributes the gradient updates once. While for GAN_g_loss we only use it to update the gradients of the generator (because we only do self.model_g._update and no self.model_d._update afterwards). See the following example (the same variables are used as the previous example). When two losses are backwarded separately, the gradients are updated separately (but the second will be accumulated upon the first one, this is fine if we update the first one in advance)

layer.zero_grad()
y1 = layer(x)
y2 = layer(x)
loss1 = c(y1, x)
loss2 = c(y2, x)

loss1.backward()
print(layer.weight.grad)
loss2.backward()
print(layer.weight.grad)

liaohaofu / adn

Some questions about the code implementation~Thank you. #9