Getting better results by setting use_GAN to 0

phillipi / pix2pix

Image-to-image translation with conditional adversarial nets

https://phillipi.github.io/pix2pix/

Other

10.14k stars 1.71k forks source link

Getting better results by setting use_GAN to 0 #63

Closed yanjidriveai closed 7 years ago

yanjidriveai commented 7 years ago

The paper has suggested to use a combination of both the GAN loss and L1 loss. But by turning off the GAN loss by setting use_GAN=0, I actually got much more detailed model outputs on the edges2shoes dataset ( trained for ~24 hours on a Titan X GPU )

tinghuiz commented 7 years ago

That's a bit surprising. Do you mind sharing some of your results (ideally with L1 only and L1+cGAN side by side)?

yanjidriveai commented 7 years ago

@tinghuiz sure, i shared them on https://www.dropbox.com/sh/y2suycafjq6mi3q/AAC1rQtOanximUy2nBHIoSZ_a?dl=0

junyanz commented 7 years ago

Thanks for sharing the results with us. If I understand your task correctly, you were trying to map photo to sketch. From our experience, GANs have difficulty in handling discrete output as G always predicts continuous output, and it's easy for D to classify discrete vs continuous. In the case of photo => edge, the output edge map is binary. You can partially address it by adding noise to the target edge map.

ppwwyyxx commented 7 years ago

Another side note, photo -> edge is (roughly) a unambiguous mapping. In this case it's not surprising that supervised training could have better results. What's nice about conditional GAN is that it's able to handle ambiguous mapping A -> multiple possible Bs by fitting the distribution of all Bs.

icbcbicc commented 7 years ago

I meet the same problem when I was doing style trasnfer. Although the output is more blurry when I set use_GAN to 0, the peak signal to noise ratio(PSNR) seems to be higher. I think it's mainly due to the texture the cGANs generated is out of control to some extent. But L1 norm seems to be much more stable.

phillipi commented 7 years ago

I'd say the bigger issue here is deciding what we want. "Unstructured" evaluation metrics like PSNR and per-pixel classification accuracy don't necessarily benefit from using a structure loss, like a GAN. If PSNR is what we really care about, then we are probably better off directly optimizing PSNR, which is just log(L2). L1 is very similar so it should not be surprising that L1 regression will achieve higher PSNR than using a GAN.

A cool thing about GANs is they are optimizing something we don't yet know how to evaluate in a closed form! Traditional evaluation metrics will tend not to show a benefit of using GANs, but I think the GANs are on to something better than the traditional metrics :)

icbcbicc commented 7 years ago

Your answer really sovle my problem, thanks!