tamarott / SinGAN

Official pytorch implementation of the paper: "SinGAN: Learning a Generative Model from a Single Natural Image"
https://tamarott.github.io/SinGAN.htm
Other
3.29k stars 608 forks source link

image Super-resolution #94

Open My-git96 opened 4 years ago

My-git96 commented 4 years ago

SinGAN is a very impressive work. But in the SR mode, I do not understand the reason why SinGAN is able to generate the correct SR image. I mean, not only the correct image size but the position of objects.

tamarott commented 4 years ago

For SR, SinGAN continues to upsample and generate details using the finest scale generator for several times, and is therefore able to super-resolved images. The key idea is that the image statistics tends to be similar across scales and we can therefore use the last generator to generate images of sizes greater than the LR training image size.

My-git96 commented 4 years ago

Thanks for your patience. I have another question: When testing, why do we upsample and refine the LR image for several times instead of upsampling the LR directly to the desired size (resolution) and then feeding it to the last generator?

tamarott commented 4 years ago

for SR, SinGAN is trained with an upsampling factor of 2^(1/3) between levels, so to get an SR factor of x4 for example, will need to use the upsampling process 6 times

My-git96 commented 4 years ago

All right. THANKS again!

danielkaifeng commented 4 years ago

I‘ve play SinGAN for several days and curious about super resolution.

In test pharse we feed the LR image into the last generator for several times, only the last generator is used to generate HR image. Gs_sr.append(Gs[-1])

So what is the benefit to train the previous Gs from very small pixel resized real image? Those lower scale generators aren't used in prediction. Maybe we can train fewer scales on larger real image to get detail generation generators.

danielkaifeng commented 4 years ago

I think the training process is loading last scale's G and D repeatedly, so the learning info is forward from very coarse level to the finest level. That's the reason why we need to train so many scales sequentially. Correct me if I am wrong.

if (nfc_prev==opt.nfc):
            G_curr.load_state_dict(torch.load('%s/%d/netG.pth' % (opt.out_,scale_num-1)))
            D_curr.load_state_dict(torch.load('%s/%d/netD.pth' % (opt.out_,scale_num-1)))