Adjusting Receptive Field Size?

jcjohnson / fast-neural-style

Feedforward style transfer

4.28k stars 814 forks source link

Adjusting Receptive Field Size? #149

Open 3DTOPO opened 6 years ago

3DTOPO commented 6 years ago

I have found that the all else being equal, the lower the resolution of images to be stylized, the more abstract the image is. For instance, an input image of 256 pixels is significantly more abstract than stylizing the same image at 1024 pixels.

Is there a way to increase the receptive field size by say four-fold, so that at 1024 pixels, the image would be approximately the abstract level of a 256 pixel image, but at a higher resolution?

htoyryla commented 6 years ago

With the original, iterative neural-style, something like this can be achieved by processing the image initially at small size and the feeding the result through the process several times with increasing size.

I guess one could try this with fast-neural-style, too?

3DTOPO commented 6 years ago

Thanks for the suggestion. I have tried that with interesting results, but compared to the low resolution (highly abstract) image, a lot of detail is added which takes away from the more abstract qualities I would like to achieve.

htoyryla commented 6 years ago

compared to the low resolution (highly abstract) image, a lot of detail is added which takes away from the more abstract qualities I would like to achieve

I know the feeling. I, too, am interested in simpler, abstract-like images. There is a Finnish word I have in mind, "pelkistää" which means "reduce to the bare essentials".

ArtlyStyles commented 6 years ago

The first convolution layer filter size is 5x5, you can change that to 21x21. Re-trian.

3DTOPO commented 6 years ago

Thanks for the tip. Can that be done with the architecture argument, or does the source code need to be modified?

ArtlyStyles commented 6 years ago

Yes. In the code:

cmd:option('-arch', 'c9s1-32,d64,d128,R128,R128,R128,R128,R128,u64,u32,c9s1-3')

So you need to use "-arch c21s1-32,d64,d128,R128,R128,R128,R128,R128,u64,u32,c9s1-3" in the training command line.

3DTOPO commented 6 years ago

Thank you very much, I'll give that a try!

ArtlyStyles commented 6 years ago

But from my experience, you do not want a very high resolution style images. That just give you too much "noise" in the generated image.

3DTOPO commented 6 years ago

Yeah, seems around 256 usually gives the best results for the style image.

3DTOPO commented 6 years ago

-arch c21s1-32,d64,d128,R128,R128,R128,R128,R128,u64,u32,c9s1-3 doesn't work. Any value other than c9s1 seems to generate this error:

bin/luajit: lua/5.1/nn/Module.lua:252: Torch object, table, thread, cdata or function expected stack traceback: [C]: in function 'pointer' lua/5.1/nn/Module.lua:252: in function 'flatten' lua/5.1/nn/Module.lua:326: in function 'getParameters' train.lua:139: in function 'main' train.lua:327: in main chunk [C]: in function 'dofile' .../src/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405d50

Any ideas?