knazeri / edge-connect

EdgeConnect: Structure Guided Image Inpainting using Edge Prediction, ICCV 2019 https://arxiv.org/abs/1901.00212
http://openaccess.thecvf.com/content_ICCVW_2019/html/AIM/Nazeri_EdgeConnect_Structure_Guided_Image_Inpainting_using_Edge_Prediction_ICCVW_2019_paper.html
Other
2.51k stars 532 forks source link

The Problem of training Edge model #33

Closed Haoyanlong closed 5 years ago

Haoyanlong commented 5 years ago

Hello, I meet some trouble when training the edge model as follows use the code parameters('LR':0.0001, 'D2G_LR':0.1), and I dont' understand the difference of LR and D2G_LR. image And the gen_loss has been turbulent.

image image image For the edge image, the above is the output of edge model, and the down is the groundtruth. I think the D2G_LR is too big, could you help me?

knazeri commented 5 years ago

As for the oscillation in the loss, this is expected behavior with GAN models, but your generator loss is way too high! A few notes to consider:

The D2G_LR flag determines the ratio of the discriminator's learning rate with respect to that of generdator. For example if your base LR is 0.0001 then your discriminator's LR becomes 0.00001.

Haoyanlong commented 5 years ago

@knazeri ,OK, I See!Thank you very much!

Haoyanlong commented 5 years ago

@knazeri ,I have loaded the model pretraind in Place2. I keep the parameters no change in config.yam, and I set the Mask:3. And the loss visualization as follows:

image And the gen_loss has been oscillating and slowly increasing.Could you help me ?Thank you very much!

cmyyy commented 5 years ago

@Haoyanlong Hello,could u tell me how to do visualization? Thanks a lot!

Haoyanlong commented 5 years ago

@cmyyy ,I use the tensorboardX for visualization and you can install it and learn from https://github.com/lanpa/tensorboardX.

knazeri commented 5 years ago

@Haoyanlong Your generator loss is still diverging, which could be because the learning rate is too large. During training, we scaled down the learning rate. The final learning rate we trained the model with was 1e-6, any value larger than that can make the trained model diverge! Also, please note that there was an error in the default config.yml file regarding the style loss value, which was fixed here: https://github.com/knazeri/edge-connect/issues/36

I'm reopening the issue, I was not being notified of the comments .

LuckyHeart commented 5 years ago

@knazeri Thanks for sharing your code with us. Excellent work! However, I meet some trouble when training the edge model like @Haoyanlong When I training the edge model, I found the generator loss slowly increasing, is the result right? I trained the model with the learning rate of 0.000001 and 0.0001, image image

I only change the INPUT_SIZE:128; MASK:3 The dataset what I use is CeleBa, #################################config.yml################### MODE: 1 # 1: train, 2: test, 3: eval MODEL: 1 # 1: edge model, 2: inpaint model, 3: edge-inpaint model, 4: joint model MASK: 3 # 1: random block, 2: half, 3: external, 4: (external, random block), 5: (external, random block, half) EDGE: 1 # 1: canny, 2: external NMS: 1 # 0: no non-max-suppression, 1: applies non-max-suppression on the external edges by multiplying by Canny SEED: 10 # random seed GPU: [0] # list of gpu ids DEBUG: 0 # turns on debugging mode VERBOSE: 0 # turns on verbose mode in the output console

TRAIN_FLIST: ./datasets/celeba_train.flist VAL_FLIST: ./datasets/celeba_val.flist TEST_FLIST: ./datasets/celeba_test.flist

TRAIN_EDGE_FLIST: VAL_EDGE_FLIST: TEST_EDGE_FLIST:

TRAIN_MASK_FLIST: ./datasets/masks_train.flist VAL_MASK_FLIST: ./datasets/masks_val.flist TEST_MASK_FLIST: ./datasets/masks_test.flist

LR: 0.0001 # learning rate D2G_LR: 0.1 # discriminator/generator learning rate ratio BETA1: 0.0 # adam optimizer beta1 BETA2: 0.9 # adam optimizer beta2 BATCH_SIZE: 8 # input batch size for training INPUT_SIZE: 128 # input image size for training 0 for original size SIGMA: 2 # standard deviation of the Gaussian filter used in Canny edge detector (0: random, -1: no edge) MAX_ITERS: 2e6 # maximum number of iterations to train the model

EDGE_THRESHOLD: 0.5 # edge detection threshold L1_LOSS_WEIGHT: 1 # l1 loss weight FM_LOSS_WEIGHT: 10 # feature-matching loss weight STYLE_LOSS_WEIGHT: 250 # style loss weight CONTENT_LOSS_WEIGHT: 0.1 # perceptual loss weight INPAINT_ADV_LOSS_WEIGHT: 0.1 # adversarial loss weight

GAN_LOSS: nsgan # nsgan | lsgan | hinge GAN_POOL_SIZE: 0 # fake images pool size

SAVE_INTERVAL: 1000 # how many iterations to wait before saving model (0: never) SAMPLE_INTERVAL: 1000 # how many iterations to wait before sampling (0: never) SAMPLE_SIZE: 12 # number of images to sample EVAL_INTERVAL: 0 # how many iterations to wait before model evaluation (0: never) LOG_INTERVAL: 10 # how many iterations to wait before logging training status (0: never)

knazeri commented 5 years ago

@LuckyHeart Thank you for your interest and attention to detail. This is an expected behavior of adversarial loss. Normally when training neural network we expect the loss to monotonically decrease. Of course, that's true when we have a fixed well-defined loss term. In case of an adversarial loss, the loss itself is a neural network and the optimization is performed in a zero-sum game between the generator and discriminator. In an ideal world, we prefer this loss to remain constant, meaning that the generdator and the discriminator are learning at the same pace.

However, in practice, these networks are high dimensional, non-convex, non-cooperative functions, and the balance between two players cannot be guaranteed. That being said, a very mild increase in generator loss almost at the end of training is acceptable. That means the generator is learning, but not as fast as the discriminator. And the increase in the loss essentially means that either the discriminator is learning faster, or the generator has reached its limit.

This is part of the reason training GANs are difficult. Also, keep in mind that just monitoring the loss is not always working for GANs models, as you should always look at the samples to measure the qualitative performance of the model.

LuckyHeart commented 5 years ago

@knazeri Wow! Thanks to your reply. l learned a lot!