takuseno / singan-nnabla

SinGAN implementation with NNabla
5 stars 1 forks source link

How to train? #3

Open KazukiYoshiyama-sony opened 4 years ago

KazukiYoshiyama-sony commented 4 years ago
takuseno commented 4 years ago

Hi, @TE-KazukiYoshiyama san.

Thank for seeing my codes in detail! Let me answer each question here.

redundant backward

I believe this backward is necessary to update the discriminator. I'd like to know what makes you ask this question :)

freezing lower scale generators

I've already implemented this. Each loop of train_single_scale only updates the current scale of networks.

one-centered gradient penalty.

I've taken this from the original implementation. I've tried axis=[1, 2, 3] instead of axis=1. But, the generated images were corrupted. Eventually, I found that the model generates high-quality images with axis=1. I guess that the centering gradients of each patch (not a whole image) stabilize training.

KazukiYoshiyama-sony commented 4 years ago

Redundant backward

What I concerned about is when you do like this

d_error.backward  # backward at the discriminator loss

The backward, even for the generator, is performed unless you unlink the generator output or set <generator output>.need_grad = False.

The code seems that even if you do not update the trainable parameters of the generator, the gradients w.r.t. the trainable parameters of the generator are computed, which is redundant computation. Please correct me if my understanding is incorrect and point me to lines of the code which addresses this issue.

one-centered gradient penalty.

I was just taking about the variable name r1_zc_gp, which seems to indicate R1 Zero-centered Gradient Penalty, but it is the one-centered, right? since the norm is targeted to one in the mean squared error.