How to train EDSR_baseline_x4 with WGAN-GP?

gentleboy commented 6 years ago

Hi, I'm trying to train EDSR_baseline_x4 with WGAN-GP, but I don't know how to do it. I want to ask the following questions:

In the discriminator, should batch normalization be removed? (I see that batch normalization has not been removed in your code )
How to set (beta1, beta2, learning rate) of Adam for optimizing discriminator and generator?
How to set the k value for adversarial loss? (I see that the default value of gan_k is 1 in your code )
How to set the weights of VGG54 and generator loss?

Can you give me some advice？

Thank you!

sanghyun-son commented 6 years ago

Hello.

I think answers below can help you.

I think batch-normalization 'can' be removed, but that is not a mandatory.
These lines control the hyperparameters of the generator (SR network) optimizer. You can change them with input arguments. (ex. python main.py --lr 5e-5) You can modify the discriminator optimizer's hyperparameters by fixing these lines, if you are using WGAP-GP configuration. They are hard-coded. In case of using GAN and WGAN loss, both generator and discriminator share their optimizer's hyperparameters.
Use --gan_k [n] argument to modify it.
You can refer to this line, to check how you set the loss function.

Although I implemented WGAN loss and checked it working, there are several things to consider.

In the original WGAN-GP paper, authors set --gan_k to 5, but it takes so much time as the output patch_size is 192x192 or 96x96 in my default setting.
If gan_k is larger than 1, you require multiple output batches to update the discriminator several times. This is not a big problem in the traditional generative model(like DCGAN) because they generate images from multi-dimensional uniform or normal distributions, which can be generated any time you want. However, super-resolution network have to take low-resolution patches as input, and they should be sampled from dataset. Until now, the adversarial loss function class do not access to the dataset, and I use a single batch to update the discriminator gan_k times. I am not sure on this approach.
Code can be executed, but it does not seem to converge.

For these reasons, I recommend you to use naive GAN as SRGAN did. You can do this by running this line.

Thank you!

gentleboy commented 6 years ago

Thank you for detail and clear answers.

In addition, I would also like to ask if MDSR_baseline can be trained with naive GAN? If the answer is yes, how should I do it?

Thank you!

sanghyun-son commented 6 years ago

You can train MDSR_baseline with adversarial loss.

However, there is one thing to be changed from the code.

Currently, MDSR is designed to take 48x48 input patches for all scales, and returns 96x96, 144x144, and 192x192 output patches.

Because GAN discriminator should take 96x96 patches as its input, it is better to change the input patch-size for different scales (ex. 48x48 for scale 2, 32x32 for scale 3, and 24x24 for scale 4.)

Another approach is to use global-average pooling at the end of the discriminator to make scale-independent model.

If you are not in hurry, I will test and upload the script.

Thank you!

gentleboy commented 6 years ago

Thank you for your reply. I am looking forward to your test results.

In addition，for EDSR_baseline_x4, I'm curious whether using WGAN-GP will get better performance than naive GAN? I don't have enough ability to implement WGAN-GP correctly now，can you take some time to test WGAN-GP?

Thank you very much!

sanghyun-son commented 6 years ago

Hello.

I tested MDSR-GAN and got satisfying results.

You have to change some codes for this experiment.

These two lines should be replaced to

tp = patch_size

if you want to train MDSR-GAN.

Also, I used a script below:

python main.py --template GAN --model MDSR --scale 2+3+4 --save MDSR_GAN --reset --patch_size 96 --loss 5*VGG54+0.15*GAN --pre_train ../experiment/model/MDSR_baseline.pt --ext bin --save_results --data_test Set14

Also, my WGAN-GP implementation is valid.

However, I do not think that it is appropriate to directly use WGAN-GP formulation to super-resolution.

If I found nice approach, I will let you know.

Thank you!

gentleboy commented 6 years ago

Can you send me a copy of the trained MDSR-GAN model? I want to see its super-resolution results.

My email address is: 972740042@qq.com

Thank you!

sanghyun-son commented 6 years ago

You can download it from here.

I think x4 output is not that satisfying with my default hyper-parameters.

Maybe they can be better if you use smaller weight to adversarial loss (--loss 5*VGG54+0.1*GAN seems to be appropriate.), or changing other parameters.

Thank you!

gentleboy commented 6 years ago

Thank you!

Jasas9754 commented 6 years ago

https://github.com/JustinhoCHN/SRGAN_Wasserstein

이후에 이런 레포가 생겼네요. 참고할 수 있을거 같습니다.

Jasas9754 commented 6 years ago

And what about wgan-hinge https://arxiv.org/abs/1803.01541

Is it a waste of time? I'm curious about the result.

sanghyun-son commented 6 years ago

I think every experiment is worth trying.

However, I do not have enough time for implement advanced WGANs, so it will very nice if someone make a pull request.

Thank you for letting me know those ideas!

sanghyun-son / EDSR-PyTorch

How to train EDSR_baseline_x4 with WGAN-GP? #27