junyanz / pytorch-CycleGAN-and-pix2pix

Image-to-Image Translation in PyTorch
Other
22.71k stars 6.27k forks source link

WGAN/WGAN-GP in CycleGAN model #1103

Open Kaede93 opened 4 years ago

Kaede93 commented 4 years ago

hello, I noticed that you add WGAN-GP loss in CycleGAN.

I am wondering that if the generator will oscillating during training using wgan loss or wgan-gp loss instead of lsgan loss because the wgan loss might be negative value.

I replaced the lsgan loss with wgan/wgan-gp loss (the rest of parameters and model structures were same) for horse2zebra transfer mission and I found that the model using wgan/wgan-gp loss can not be trained:

  1. The Wasserstein distance was very small number (about 1e-4) in the beginning of training, is that normal? Because the WD value was large (about 1e-1~1e0) when I training the original WGAN in noise2picture mission.

  2. The discriminator/generator losses were highly oscillating and cannot see any sights of Wasserstein distance is decreasing, I tried to adjust the learning rate, but it doesn't work. Can you give me some advice?

  3. I used Keras so I set the labels of real images and generated images as 1 and 0 for lsgan, respectively. and -1 and 1 for wgan. wgan loss defined as K.mean(y_true * y_pred). Will this setting leads bad results? because I found that the accuracy of discriminator is nearly 0% when I using wgan/wgan-gp loss (30%~90%+ when using lsgan loss).

One more thing, the generator losses is wgan + cyc, and I am wondering that if the negative wgan loss value makes the generator confused? I think when the wgan loss is negative, no matter the cyc loss getting larger or smaller compare with previous training step, we can also have chance to get a smaller loss.

Am I misunderstand something? please correct me if I am wrong. Thank you for your time!

junyanz commented 4 years ago

The WGAN loss itself doesn't work without GP. Even with GP, we also haven't made it work better than the vanilla CycleGAN/pix2pix. The loss is also not very stable and meaningful for us. The WGAN-GP loss was added to the repo in case users want to use it for other models. The possible reasons could be two: (1) the PatchGAN discriminator is already quite weak. Thus, adding GP loss will make it too weak compared to the generator. (2) the GP loss assumes that the inputs are independent according to the original paper, while PatchGAN takes overlapping patches, and breaks this assumption.

Kaede93 commented 4 years ago

Thank you for your reply and I agree with your ideas. And I think the batch size is one of the factor that makes cyclegan-with-wgan-gp training very unstable, do you agree with that? I'm trying to set the batch size as 64 instead of 1 (use InstanceNorm), is this worthy to try?

I also wondering that if you have any reasonable images using WGAN-GP loss? (Even though the training was not very stable) If yes, how many training steps it takes? Because I trained CycleGAN with WGAN/WGAN-GP/WGAN-DIV loss, thier results were very bad (jusy a noise map, or sometimes looks like "ghost horses"). It seems the discriminators were too weak to give any useful guidiance to the generators.

And I realized that the WGAN (including GP and DIV version) loss are very sensitive to network structures and input size. It seems not very easy to apply the WGAN loss to other models.

junyanz commented 4 years ago

Unfortunately, we don't have reasonable images with WGAN-GP. I am not sure if it is related to batch_size. As I mentioned, GP seems to be not compatible with PatchGAN. The loss might work for other types of discriminators and tasks.

Kaede93 commented 4 years ago

thank you for your reply again.

I tried the 64 of batch size, it failed to achieve any reasonable results yet (just noise map. Even thought the Wasserstein distance is more stable than batch size of 1, and it's decreasing). The discriminator seems still too weak to feedback any useful gradients to generator, so the results might not correct in this experiment.

And I also replaced the patchGAN with DCGAN's, but the model cannot be trained too.

Is that why you use LSGAN loss instead of WGAN loss in the paper?

Thank you for your time, have a nice day

junyanz commented 4 years ago

Yes, we found that LSGAN is more effective in our paper.

Kaede93 commented 4 years ago

Sorry for the late reply.

You mentioned that you got better results by using the resnet rather unet architecture in other Issue. So what's the better means? In my experiments (using horse2zebra dataset), I got better style transfer results with resnet but the better image quality results with unet.

I am wondering if I should build the network with unet when I want to keep details of background as much as possible (especially the color information), or can you give me some tips? I also added the SSIM and perceptual loss to the loss function, but it seems the reconstruction of the background color still not very good.

junyanz commented 4 years ago

What is the difference between better style transfer results vs. better image quality results? The background color is supposed to change as the color distribution in horse and zebra backgrounds are different. You can use an object mask if your goal is to keep the background color. See this nice work for an example.

qsunyuan commented 2 years ago

Good Share

joonas-yoon commented 2 years ago

thanks for sharing 👍

bgjeroska commented 2 years ago

Hello,

I am currently trying to train WGAN-GP with Patch-Discriminator. But it is somehow impossible to make it work. I also tried not-overlapping convents, but it didn't help much. Did someone find a way to train a good model? Or have any Ideas-why WGAN doesn't work with Patch-Discriminator?

joonas-yoon commented 2 years ago

@bgjeroska

I think this comment is for you. you can see the comment above

The possible reasons could be two: (1) the PatchGAN discriminator is already quite weak. Thus, adding GP loss will make it too weak compared to the generator. (2) the GP loss assumes that the inputs are independent according to the original paper, while PatchGAN takes overlapping patches, and breaks this assumption.

DISAPPEARED13 commented 1 year ago

Hi, there. I am sticking for a more stable training also strengthen D for clearer synthesis. Would it work for just add GP in PatchGAN? Thanks.