Open kadarakos opened 7 years ago
If you have a GPU with enough memory and have plenty of time you could train a net with a 1024x1024 input size and just center your 640x480 images - but training will take a very long time.
Alternatively you train a regular 256x256 input with enough jittering - in that case I would center the 640x480 images on a 700x700 canvas, use loadSize=700 and fineSize=256. Then you process your images in 256x256 tiles with some overlap to counteract the artifacts that usually occur at the borders and reassemble them. I am using this approach to generate outputs that can be any size - like 10000x10000 pixels and it works quite well (depending on the content of a tile there are cases where the overall brightness of a tile can differ from the surrounding tiles which I try to remedy by adapting the means of neighboring tiles)
Same problem here, except I am toying around with smaller patches, so essentially I wouldn't want to upscale at testing time, because the Discriminator will be comparing a blurred Real image and a blurred Generated image (depending on the upsampling method). Anyway, the issue I have is that I can't seem to make sense of what the Discriminator is computing for the output. My intuition says that the Discriminator's output should be 1 dimensionsal, essentially just a Real or Fake Label, but in the definitions of functions defineD_pixelGAN
, and defineD_n_layers
, it doesn't seem to be the same (and it also assumes the input is 256x256).
function defineD_pixelGAN(...)
--some code here
netD:add(nn.Sigmoid())
--state size : 1 x 30 x 30
-- My comment: Why did it go to 1x30x30?
Same for defineD_n_layers(...):
-- some code here
netD:add(nn.Sigmoid())
-- state size: 1x(N-2) x (N-2)
-- my comment: Not sure what they mean by N here....
*Complete code of Disciminator model functions:
function defineD_basic(input_nc, output_nc, ndf)
n_layers = 3
return defineD_n_layers(input_nc, output_nc, ndf, n_layers)
end
-- rf=1
function defineD_pixelGAN(input_nc, output_nc, ndf)
local netD = nn.Sequential()
-- input is (nc) x 256 x 256
netD:add(nn.SpatialConvolution(input_nc+output_nc, ndf, 1, 1, 1, 1, 0, 0))
netD:add(nn.LeakyReLU(0.2, true))
-- state size: (ndf) x 256 x 256
netD:add(nn.SpatialConvolution(ndf, ndf * 2, 1, 1, 1, 1, 0, 0))
netD:add(nn.SpatialBatchNormalization(ndf * 2)):add(nn.LeakyReLU(0.2, true))
-- state size: (ndf*2) x 256 x 256
netD:add(nn.SpatialConvolution(ndf * 2, 1, 1, 1, 1, 1, 0, 0))
-- state size: 1 x 256 x 256
netD:add(nn.Sigmoid())
-- state size: 1 x 30 x 30
return netD
end
-- if n=0, then use pixelGAN (rf=1)
-- else rf is 16 if n=1
-- 34 if n=2
-- 70 if n=3
-- 142 if n=4
-- 286 if n=5
-- 574 if n=6
function defineD_n_layers(input_nc, output_nc, ndf, n_layers)
if n_layers==0 then
return defineD_pixelGAN(input_nc, output_nc, ndf)
else
local netD = nn.Sequential()
-- input is (nc) x 256 x 256
netD:add(nn.SpatialConvolution(input_nc+output_nc, ndf, 4, 4, 2, 2, 1, 1))
netD:add(nn.LeakyReLU(0.2, true))
local nf_mult = 1
local nf_mult_prev = 1
for n = 1, n_layers-1 do
nf_mult_prev = nf_mult
nf_mult = math.min(2^n,8)
netD:add(nn.SpatialConvolution(ndf * nf_mult_prev, ndf * nf_mult, 4, 4, 2, 2, 1, 1))
netD:add(nn.SpatialBatchNormalization(ndf * nf_mult)):add(nn.LeakyReLU(0.2, true))
end
-- state size: (ndf*M) x N x N
nf_mult_prev = nf_mult
nf_mult = math.min(2^n_layers,8)
netD:add(nn.SpatialConvolution(ndf * nf_mult_prev, ndf * nf_mult, 4, 4, 1, 1, 1, 1))
netD:add(nn.SpatialBatchNormalization(ndf * nf_mult)):add(nn.LeakyReLU(0.2, true))
-- state size: (ndf*M*2) x (N-1) x (N-1)
netD:add(nn.SpatialConvolution(ndf * nf_mult, 1, 4, 4, 1, 1, 1, 1))
-- state size: 1 x (N-2) x (N-2)
netD:add(nn.Sigmoid())
-- state size: 1 x (N-2) x (N-2)
return netD
end
end
... it works quite well (depending on the content of a tile there are cases where the overall brightness of a tile can differ from the surrounding tiles which I try to remedy by adapting the means of neighboring tiles)
@Quasimondo could you possibly elaborate on how you 'adapted the means' of neighbouring tiles? I am also attempting to use pix2pix for upscaling some generated samples of mine, however I cannot seem to remove the edge effects due to varying brightness - even with overlapping technique.
What I do is a rather naive approach - I calculate the RGB mean value of the input tile and then the RGB mean of the output tile, then I add the difference of the two means to every pixel of the output image. It does not always work perfectly but it's good enough for my purposes. I thought about using poisson blending (like it's done in panorama stitching) to solve some of the edge effects, but didn't get around to implementing that yet.
@ArturoDeza That's confusing me too right now. Did you find an answer to that yet?
@aneesh3108 , I ended up training and testing on 256x256 patches for super-resolution purposes in my latest NIPS submission: https://arxiv.org/abs/1705.10041
Since the network is fully convolutional, it still produces high quality results on 512x512 inputs/outputs (not sure about images that aren't symmetric in width and height).
I think it is tricky if you want to work with small size patches!
@ArturoDeza It was a typo in the comment! (for 1 x 30 x 30 )
My thinking is that it assigns 1/0 to each and every pixel rendered in pixelGAN and 1/0 to each patch under consideration in PatchGAN.
Now that the typo is out of the way, I'm going to run it altogether and test it again. Hopefully it solves all other issues.
Is it possible to somehow use the U-net architecture with images of size 480 x 640? In its current implementation the U-net seems to only work with images of size 256x256, due to the receptive fields. Also sizes that are not powers of 2 also don't work. Is there some work around?
Thank you!