Closed Haoyanlong closed 5 years ago
As for the oscillation in the loss, this is expected behavior with GAN models, but your generator loss is way too high! A few notes to consider:
MASK:5
which we used only for experiments. I suggest to use only irregular mask MASK:3
to get better results and faster convergence. INPUT_SIZE:128
in the configuration and see if the model is converging. You can reuse the weights when moving to larger input images.The D2G_LR
flag determines the ratio of the discriminator's learning rate with respect to that of generdator. For example if your base LR is 0.0001 then your discriminator's LR becomes 0.00001.
@knazeri ,OK, I See!Thank you very much!
@knazeri ,I have loaded the model pretraind in Place2. I keep the parameters no change in config.yam, and I set the Mask:3. And the loss visualization as follows:
And the gen_loss has been oscillating and slowly increasing.Could you help me ?Thank you very much!
@Haoyanlong Hello,could u tell me how to do visualization? Thanks a lot!
@cmyyy ,I use the tensorboardX for visualization and you can install it and learn from https://github.com/lanpa/tensorboardX.
@Haoyanlong Your generator loss is still diverging, which could be because the learning rate is too large. During training, we scaled down the learning rate. The final learning rate we trained the model with was 1e-6
, any value larger than that can make the trained model diverge!
Also, please note that there was an error in the default config.yml file regarding the style loss value, which was fixed here: https://github.com/knazeri/edge-connect/issues/36
I'm reopening the issue, I was not being notified of the comments .
@knazeri Thanks for sharing your code with us. Excellent work! However, I meet some trouble when training the edge model like @Haoyanlong When I training the edge model, I found the generator loss slowly increasing, is the result right? I trained the model with the learning rate of 0.000001 and 0.0001,
I only change the INPUT_SIZE:128; MASK:3 The dataset what I use is CeleBa, #################################config.yml################### MODE: 1 # 1: train, 2: test, 3: eval MODEL: 1 # 1: edge model, 2: inpaint model, 3: edge-inpaint model, 4: joint model MASK: 3 # 1: random block, 2: half, 3: external, 4: (external, random block), 5: (external, random block, half) EDGE: 1 # 1: canny, 2: external NMS: 1 # 0: no non-max-suppression, 1: applies non-max-suppression on the external edges by multiplying by Canny SEED: 10 # random seed GPU: [0] # list of gpu ids DEBUG: 0 # turns on debugging mode VERBOSE: 0 # turns on verbose mode in the output console
TRAIN_FLIST: ./datasets/celeba_train.flist VAL_FLIST: ./datasets/celeba_val.flist TEST_FLIST: ./datasets/celeba_test.flist
TRAIN_EDGE_FLIST: VAL_EDGE_FLIST: TEST_EDGE_FLIST:
TRAIN_MASK_FLIST: ./datasets/masks_train.flist VAL_MASK_FLIST: ./datasets/masks_val.flist TEST_MASK_FLIST: ./datasets/masks_test.flist
LR: 0.0001 # learning rate D2G_LR: 0.1 # discriminator/generator learning rate ratio BETA1: 0.0 # adam optimizer beta1 BETA2: 0.9 # adam optimizer beta2 BATCH_SIZE: 8 # input batch size for training INPUT_SIZE: 128 # input image size for training 0 for original size SIGMA: 2 # standard deviation of the Gaussian filter used in Canny edge detector (0: random, -1: no edge) MAX_ITERS: 2e6 # maximum number of iterations to train the model
EDGE_THRESHOLD: 0.5 # edge detection threshold L1_LOSS_WEIGHT: 1 # l1 loss weight FM_LOSS_WEIGHT: 10 # feature-matching loss weight STYLE_LOSS_WEIGHT: 250 # style loss weight CONTENT_LOSS_WEIGHT: 0.1 # perceptual loss weight INPAINT_ADV_LOSS_WEIGHT: 0.1 # adversarial loss weight
GAN_LOSS: nsgan # nsgan | lsgan | hinge GAN_POOL_SIZE: 0 # fake images pool size
SAVE_INTERVAL: 1000 # how many iterations to wait before saving model (0: never) SAMPLE_INTERVAL: 1000 # how many iterations to wait before sampling (0: never) SAMPLE_SIZE: 12 # number of images to sample EVAL_INTERVAL: 0 # how many iterations to wait before model evaluation (0: never) LOG_INTERVAL: 10 # how many iterations to wait before logging training status (0: never)
@LuckyHeart Thank you for your interest and attention to detail. This is an expected behavior of adversarial loss. Normally when training neural network we expect the loss to monotonically decrease. Of course, that's true when we have a fixed well-defined loss term. In case of an adversarial loss, the loss itself is a neural network and the optimization is performed in a zero-sum game between the generator and discriminator. In an ideal world, we prefer this loss to remain constant, meaning that the generdator and the discriminator are learning at the same pace.
However, in practice, these networks are high dimensional, non-convex, non-cooperative functions, and the balance between two players cannot be guaranteed. That being said, a very mild increase in generator loss almost at the end of training is acceptable. That means the generator is learning, but not as fast as the discriminator. And the increase in the loss essentially means that either the discriminator is learning faster, or the generator has reached its limit.
This is part of the reason training GANs are difficult. Also, keep in mind that just monitoring the loss is not always working for GANs models, as you should always look at the samples to measure the qualitative performance of the model.
@knazeri Wow! Thanks to your reply. l learned a lot!
Hello, I meet some trouble when training the edge model as follows use the code parameters('LR':0.0001, 'D2G_LR':0.1), and I dont' understand the difference of LR and D2G_LR. And the gen_loss has been turbulent.
For the edge image, the above is the output of edge model, and the down is the groundtruth. I think the D2G_LR is too big, could you help me?