jcjohnson / neural-style

Torch implementation of neural style algorithm
MIT License
18.31k stars 2.7k forks source link

Layers test questions #363

Open MrZoidberg opened 7 years ago

MrZoidberg commented 7 years ago

I'm trying to get results with less image distortion while still preserving the style and I was trying to replicate the tests similar to this: http://imgur.com/a/rS5NS. Currently, I'm only trying to test different style layers with the default relu4_2 content layer. Even this test have about 32000 combinations and will take a couple of days to complete on my GTX 760 GPU (size 250px, all other settings are default).

So here are my questions: 1) Is there any way or hint to decrease the number of combinations? 2) Should I test content layers keeping in mind that all I want to archive is only less distortions (e.g. preserve face details)?

Thanks!

htoyryla commented 7 years ago

I think it is misleading to test only layer combinations while keeping style weights constant. When you change the layer selection, the balance between total style weight and total content weight changes too. So unfortunately there are far more possible test configurations. Not sure if one needs to test all layer combinations though... but I would not rely on mere layer combinations without also checking the effect of style and content weights. For instance when a particular layer combination distorts the content too much, check what happens when you increase content weight. Or if you don't get enough style, increase style weight. Only then can you really start to see what is the effect of the layer selection as opposed to the effect of the style and content weights. So I would reduce the number of layer combinations (like use only layer per level in the combinations) and instead vary the weights too. But that's only my opinion.

I remember seeing in the original paper on style transfer that the selection of the content layer is not too critical. 5_x layers will be least realistic... so probably not what you are after.

MrZoidberg commented 7 years ago

@htoyryla Thats for reply! I'll try your approach. I've been trying to replicate some of the results from the reddit DeepDream community, but it looks like everyone are playing with the style layers to get better results. Lower learningrate values with ADAM optimization also seems to give less distortions, but my results anyway have not acceptable artifacts (like this one)

htoyryla commented 7 years ago

I guess I am working with quite different esthetic goals so I have difficulty seeing what is the problem with your image. But you could try leaving out 5_x layers altogether, especially for content. They are more about what the image contains, not how it looks.

michaelhuang74 commented 7 years ago

@htoyryla I am testing Vinci and find that it is able to produce the output that keeps very little style of the style image. For example. The style image transverse

The Obama obama

The otuput of Vinci obama_vinci_transverse

I know that Vinci is based on texture_nets. I have tried to use different combinations of content layers and style layers for both texture_nets and neural style, but could not produce the output that is even close to the output of Vinci.

Do you have any idea or suggestion for the choice of content layers and style layers for this type of style images to produce the output like Vinci? Thanks.

htoyryla commented 7 years ago

So you want to use a style image to produce a result that looks nothing like the style image? OK, to get something like this I would use as large result image size as memory allows... this tends to reduce the style features and produce more like an illustration based on a photo. Then I would experiment with the style and content weights. Maybe omit any 4_x style layers and above. Probably use original_colors to use hue from the content photo to get closer to your target. But even that might not achieve the large almost white areas, especially the background, so Vinci may have some additional (more traditional) color processing going on.

michaelhuang74 commented 7 years ago

@htoyryla Thanks for the quick reply!

"OK, to get something like this I would use as large result image size as memory allows..." I want to get some clarification regarding the above sentence. If I understand correctly, for neural style, I should increase the value for "-image_size" option. If I use texture_nets, I should increase "-image_size" while decreasing "-style_size" during training. Is my understanding correct? Thanks.

htoyryla commented 7 years ago

The trick to use large image_size means having larger VGG model dimensions when measuring style, so that each node of VGG will see a smaller part of the style image. The model dimensions adapt to the size of the image fed into it, so my intuition would say do not decrease style_size. Haven't looked at the code of texture_nets recently though (how the sizes are managed).

PS. Had a look at how texture_nets manages image sizes. It appears to crop images during training, which works against what I proposed... now I remember that I have struggled with this feature in texture_nets (that image_size does not work like in neural_style). It is possible though to avoid cropping by resizing the training images to image_size.

htoyryla commented 7 years ago

Here's what I quickly trained today on texture_nets. My training set consisted of two images only, your obama and karya from the texture_nets images, both resized to 800x to avoid cropping. My training command was:

th train.lua -model johnson.lua -style_image issuestyle.jpg -data TNsets/tests_facch_size 1 -num_iterations 40000 -checkpoints_path ob -style_weight 5 -content_weight 1  -image_size 800 -style_size 800

The resulting image for your Obama picture, after 17000 iterations, image size 960, comes out like this.

obtest2

Using original colors (by a script that I have written) gives this

obtest2oc

Not as clean as the Vinci result. Another difference is that the Vinci output is much more saturated; it is possible to match the color of the face skin by just increasing saturation.

I must admit that it is often difficult to understand what other people want exactly when they say "like this picture". The best is then that each one experiments on her own, exchanging questions, results and comments but not expecting ready recipes.

michaelhuang74 commented 7 years ago

@htoyryla Thanks for the great help!

You are definitely right. Increasing both the image size and style size in the training process will make the style less visible in the resulting image.

I also notice that if I use the MSCOCO datasets for training in texture_nets and make the image size larger than 600, say 768, the training process may hang. The process will not terminate, but the GPU utilization will stay at 0%. I raised an issue (https://github.com/DmitryUlyanov/texture_nets/issues/63) over there.