Open KazukiYoshiyama-sony opened 4 years ago
Hi, @TE-KazukiYoshiyama san.
Thank for seeing my codes in detail! Let me answer each question here.
I believe this backward is necessary to update the discriminator. I'd like to know what makes you ask this question :)
I've already implemented this. Each loop of train_single_scale
only updates the current scale of networks.
I've taken this from the original implementation. I've tried axis=[1, 2, 3]
instead of axis=1
. But, the generated images were corrupted. Eventually, I found that the model generates high-quality images with axis=1
. I guess that the centering gradients of each patch (not a whole image) stabilize training.
What I concerned about is when you do like this
d_error.backward # backward at the discriminator loss
The backward, even for the generator, is performed unless you unlink the generator output or set <generator output>.need_grad = False
.
The code seems that even if you do not update the trainable parameters of the generator, the gradients w.r.t. the trainable parameters of the generator are computed, which is redundant computation. Please correct me if my understanding is incorrect and point me to lines of the code which addresses this issue.
I was just taking about the variable name r1_zc_gp
, which seems to indicate R1 Zero-centered Gradient Penalty
, but it is the one-centered, right? since the norm is targeted to one in the mean squared error.
axis=1
only.