How to train? - Githubissues

KazukiYoshiyama-sony commented 4 years ago

I am wondering about the unnecessary and redundant backward for the generator in this line of code: https://github.com/takuseno/singan-nnabla/blob/master/train.py#L230
According to the paper, the already-trained sub-networks e.g., at lower scales than the current n-th scale, are not updated when we train the sub-network at n-th scale. This is implemented in train.py?
In here, it is one-centered. Also, I am not sure why even the original author set axis=1 only.

takuseno commented 4 years ago

Hi, @TE-KazukiYoshiyama san.

Thank for seeing my codes in detail! Let me answer each question here.

redundant backward

I believe this backward is necessary to update the discriminator. I'd like to know what makes you ask this question :)

freezing lower scale generators

I've already implemented this. Each loop of train_single_scale only updates the current scale of networks.

one-centered gradient penalty.

I've taken this from the original implementation. I've tried axis=[1, 2, 3] instead of axis=1. But, the generated images were corrupted. Eventually, I found that the model generates high-quality images with axis=1. I guess that the centering gradients of each patch (not a whole image) stabilize training.

KazukiYoshiyama-sony commented 4 years ago

Redundant backward

What I concerned about is when you do like this

d_error.backward  # backward at the discriminator loss

The backward, even for the generator, is performed unless you unlink the generator output or set <generator output>.need_grad = False.

The code seems that even if you do not update the trainable parameters of the generator, the gradients w.r.t. the trainable parameters of the generator are computed, which is redundant computation. Please correct me if my understanding is incorrect and point me to lines of the code which addresses this issue.

one-centered gradient penalty.

I was just taking about the variable name r1_zc_gp, which seems to indicate R1 Zero-centered Gradient Penalty, but it is the one-centered, right? since the norm is targeted to one in the mean squared error.

takuseno / singan-nnabla

How to train? #3

redundant backward

freezing lower scale generators

one-centered gradient penalty.

Redundant backward

one-centered gradient penalty.