nothinglo / Deep-Photo-Enhancer

TensorFlow implementation of the CVPR 2018 spotlight paper, Deep Photo Enhancer: Unpaired Learning for Image Enhancement from Photographs with GANs
MIT License
794 stars 110 forks source link

Training detail and Hyperparameters for HDR unpair learning #87

Closed KewJieLong closed 4 years ago

KewJieLong commented 5 years ago

Can i have your hyperparameter for HDR unpair learning? i wrote the training code in tensorflow by myself and had been training the model for 1 day (300k steps), but the visualize result is still very far from what you show in paper.

--generator_learning_rate 0.0001 \ --discriminator_learning_rate 0.0001 \ --batch_size 2 \ --netG_regularization_weight 0 \ --netD_regularization_weight 0 \ --input_size 256 \ --loss_source_data_term_weight 1e3 \ --loss_constant_term_weight 1e4 \ --gp_weight_A 10 \ --gp_weight_B 10 \ --global_gradient_clipping 1e8 \ --update_netD_times 50 \ --loss_wgan_gp_mv_decay 0.99 \ --loss_wgan_gp_bound 5e-2 \ --netD_buffer_times 50 \ --loss_wgan_lambda_grow 2

i also modified your architecture by using global mean pooling instead of a big convolutional receptive field to make the feature map into 1 x 1. This allow me to take in 256 x 256 images. i used Titan XP (11 GB memory) to train the model. basically this is impossible for me to train D_A, D_B, G_A and G_B at the same time with 512 x 512 inputs size. so i used 256 x 256 inputs size, and train G_A and G_B separately.

In the paper, you use a adaptive weighting schema to adjust the weight for gradients penalty. However, in your code, you only do increasing the weight when the moving average is larger then upper bound but not decreasing the weight when the moving average is smaller than lower bound. should i be worry about this?

i am not familiar with WGAN-GP, hence i am not sure i am doing it correctly or not.

** in my code for optimizing WGAN-GP, real image is positive and fake image is negative.

Here is the graph for 300 K +- steps for gp_A and gp_B Screenshot from 2019-09-12 17-52-57

For NetD_A and NetD_B , the loss for both of them are still going up, is that indicated that the discriminator is not learning correctly? Screenshot from 2019-09-12 17-53-40

for NetG_A2B_adv_loss and NetG_B2A_adv_loss, i am not sure i read this graph correctly. NetG_A2B is bad as critic value is small and netG_B2A is good as critic value is big? Screenshot from 2019-09-12 17-54-06

For data source constraint loss and data term constraint loss, i think they are doing pretty well. i have no problem with that. Screenshot from 2019-09-12 17-54-24 Screenshot from 2019-09-12 17-54-33

Do u mind to share your experience with me?

Thanks!

KewJieLong commented 5 years ago

here is the sample output for NetG_A2B on steps 370k

370000_A2B_3

And also people said that batch normalization should be avoid for critic, have u try any experiment with no batch normalization in discriminator?