jcjohnson / neural-style

Torch implementation of neural style algorithm
MIT License
18.31k stars 2.7k forks source link

Implementing features from the "Controlling Perceptual Factors in Neural Style Transfer" research paper #376

Open ProGamerGov opened 7 years ago

ProGamerGov commented 7 years ago

I have been trying to implement the features described in the "Controlling Perceptual Factors in Neural Style Transfer" research paper.

The code that used for the research paper can be found here: https://github.com/leongatys/NeuralImageSynthesis

The code from Leon Gatys' NeuralImageSynthesis is written in Lua, and operated with an iPython notebook interface.


So far, my attempts to transfer the features into Neural-Style have failed. Has anyone else had success in transferring the features?

Looking at the code, I think that:

In order to make NeuralImageSynthesis alongside your Neural-Style install, you must replace every instance of /usr/local/torch/install/bin/th with /home/ubuntu/torch/install/bin/th. You must also install hdf5 with luarocks install hdf5, matplotlib withsudo apt-get install python-matplotlib, skimage with sudo apt-get install python-skimage, and scipy with sudo pip install scipy. And of course you need to install and setup jupyter if you want to use the notebooks.

ProGamerGov commented 7 years ago

Ok, I think I have gotten the new -reflectance parameter working, though I don't know what it does: https://github.com/ProGamerGov/neural-style/blob/master/neural_style.lua

Though it seems to alter the output.

ProGamerGov commented 7 years ago

Multires without -reflectance: https://i.imgur.com/LvpXgaW.png

Multires with -reflectance: https://i.imgur.com/YIiqsOx.png

The -reflectance command increases the GPU usage.

Content image: https://i.imgur.com/sgLtFDi.png

Style image: https://i.imgur.com/PsXIJLM.jpg

htoyryla commented 7 years ago

It seems to me that your code inserts the new padding layer after the convolution layer which already has done padding, so that padding is done twice (first with zeroes in nn.SpatialConvolution and the by reflection in nn.SpatialReflectionPadding). It is like first adding an empty border and the another one which acts as if a mirror. It would seem to me that the mirror then only reflects the empty border that was added first.

If you look closely at Gatys' code in https://github.com/leongatys/NeuralImageSynthesis/blob/master/ImageSynthesis.lua#L85-L94 you'll notice that the new padding layer is inserted first, and then the convolution layer without padding.

Your code also increases the size of the layer output, as padding is done twice, which might give size mismatch errors.

htoyryla commented 7 years ago

In my previous comment, I overlooked the fact that it is possible to change the layer parameters after the layer has been added to the model. Thus the lines https://github.com/ProGamerGov/neural-style/blob/master/neural_style.lua#L140-L141 in fact remove the padding from the already inserted convolution layer, so the double padding does not happen and the size of the output is not changed.

Thus the main difference between your code and Gatys' is that you do padding after the convolution, while the normal practice is to do padding before convolution.

ProGamerGov commented 7 years ago

@htoyryla

Thus the main difference between your code and Gatys' is that you do padding after the convolution, while the normal practice is to do padding before convolution.

So the reflectance padding works correctly, though I have placed it in the wrong location?

This code here is the convolution: https://github.com/ProGamerGov/neural-style/blob/master/neural_style.lua#L131-L142 ?

ProGamerGov commented 7 years ago

And for implementing the masks, Gatys' implementation uses hdf5 files, though Neural-Style does not:

cmd:option('-mask_file', 'path/to/HDF5file', 'Spatial mask to constrain the gradient descent to specific region')

    -- Load mask if specified
    local mask = nil
    if params.mask_file ~= 'path/to/HDF5file' then
        local f = hdf5.open(params.mask_file, 'r')
        mask = f:all()['mask']
        f:close()
        mask = set_datatype(mask, params.gpu)
    end

I have been trying to figure out how to modify the above code for Neural-Style masks, but non of my attempts to replace the hdf5 requirement have worked thus far. Any ideas?

htoyryla commented 7 years ago

The code you now linked looks better, now the padding is inserted (line #127) before the convolution (line #141). Most of what you have highlighted is NOT the convolution but related to selecting between max and avg pooling. But if you follow the if logic, if the layer is convolution it will be inserted to the model in line 141 of your present code.

I cannot guarantee that it now works but now the padding and convolution come in the correct order.

htoyryla commented 7 years ago

"I have been trying to figure out how to modify the above code for Neural-Style masks, but non of my attempts to replace the hdf5 requirement have worked thus far. Any ideas?"

The code you cited does not implement any mask functionality, it only loads a mask from an existing hdf5 file.

ProGamerGov commented 7 years ago

I ran a quick test with the -reflectance option. The change is not particularly obvious at first glance, but it does appear to cause a change. More testing, and different parameter combinations could be needed to farther understand it's affect on artistic outputs.

On the left is the control test with -reflectance false, and on the right is -reflectance true:

Direct link to the comparison: https://i.imgur.com/YGCOCiu.png

False: https://i.imgur.com/0oQNsxl.png

True: https://i.imgur.com/a7fQTLb.png

Command used:

th neural_style.lua -seed 876 -reflectance -num_iterations 1500 -init image -image_size 640 -print_iter 50 -save_iter 50 -content_image examples/inputs/hoovertowernight.jpg -style_image examples/inputs/starry_night.jpg -backend cudnn -cudnn_autotune

ProGamerGov commented 7 years ago

Are Gatys' Grad related functions different that Neural-Styles? I'm looking for where the style masks come into play. Or should I be looking at different functions for implementing these features like masks?

ProGamerGov commented 7 years ago

From what I can see, luminescence style transfer requires the LUV color space, which unlike YUV, it has no easy to use function in the image library.

Style masks seem to require a modifying deeper levels of the Neural-Style code.


For the independent style_scale control with multiple style images, it seems like we only need a way to disable content loss:

From the research paper:

We initialise the optimisation procedure with the coarse-scale image and omit the content loss entirely, so that the fine-scale texture from the coarse-style image will be fully replaced.

And then a simple sh script similar to multires.sh should do the trick. That runs your style images through Neural-Style first should do the trick, but such a script needs a way to disable content loss.

I am thinking that a parameter like:

cmd:option('-content_loss', true, 'if set to false, content loss will be disabled')

if params.reflectance then 

content loss code

end

@htoyryla Which part of the content loss code should this be implemented on to achieve the desired effect?

https://github.com/ProGamerGov/neural-style/blob/master/neural_style.lua#L461-L497

Or: https://github.com/ProGamerGov/neural-style/blob/master/neural_style.lua#L109

Edit: I figured it out and now the content loss module can be disabled.

Currently testing different parameters alongside the new -content_loss parameter: https://gist.github.com/ProGamerGov/7f3d2b6656e02a7a4a23071bd0999b31

I edited this part of the neural_style.lua script: https://gist.github.com/ProGamerGov/7f3d2b6656e02a7a4a23071bd0999b31#file-neural_style-lua-L148-L151

Though I think that I need to find a way to transfer the color from the intended content image, to this first Neural-Style run with the two style images. Seeing as -init image includes, content as well, maybe I need to add another new parameter, or maybe using -original_color 1 on step two will solve this problem?

Second Edit:

It seems that -content_layers relu1_1,relu2_1 and the default style layers work the best, Though the research paper only specified layers relu1_1 and relu2_1, not whether you should use those values for content or style layers.

ProGamerGov commented 7 years ago

I must be missing something when trying to replicate the "Naive scale combination" from here: https://github.com/leongatys/NeuralImageSynthesis/blob/master/ExampleNotebooks/ScaleControl.ipynb

Following the steps on the research paper:


Should result in something like this output that I made running Gatys' iPython code: https://i.imgur.com/boz8PhW.jpg

And the styled style image from his code: https://i.imgur.com/6xEumk0.jpg


But instead I get this:

The styled style image: https://i.imgur.com/30HUeOH.png

And here is the final output: https://i.imgur.com/SWhzMn0.png

I tried this code to create the styled style image: https://gist.github.com/ProGamerGov/53979447d09fe6098d4b00fc8e924109

And then ran:

th neural_style_c.lua -original_colors 1 -output_image out.png -num_iterations 1000 -content_image fig4_content.jpg -style_image out7.png -image_size 640 -save_iter 50 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune


The final content image: https://raw.githubusercontent.com/leongatys/NeuralImageSynthesis/master/Images/ControlPaper/fig4_content.jpg

The two style images:

https://raw.githubusercontent.com/leongatys/NeuralImageSynthesis/master/Images/ControlPaper/fig4_style3.jpg

https://raw.githubusercontent.com/leongatys/NeuralImageSynthesis/master/Images/ControlPaper/fig4_style2.jpg


What am I doing wrong here?

ProGamerGov commented 7 years ago

Ok, so analyzing the styled style image from Gatys' code:

The outputs have the parameters used, and the values used, in the name:

[scimg_fig4_content.jpg_spimg_fig4_style2.jpg_simg_fig4_style3.jpg_hrpt_layer_relu4_1_hrsz_1024_model_norm_pad_ptw_1.0E+05]

I think was used to make this: https://i.imgur.com/6xEumk0.jpg


From another experiment using his code:

cimg_cm_fig4_content.jpg_scimg_fig4_content.jpg_spimg_Amazing-Nature_3840x2160.jpg_simg_raime.jpg_pt_layer_relu2_1_sz_512_model_norm_pad_sw_2.0E+08_cw_1.0E+05_naive_scalemix.jpg

The enlarged version (I think 1 step multires?):

cimg_cm_fig4_content.jpg_scimg_fig4_content.jpg_spimg_raime.jpg_simg_Amazing-Nature_3840x2160.jpg_pt_layer_relu2_1_sz_512_hrsz_1024_model_norm_pad_sw_2.0E+08_cw_1.0E+05_naive_scalemix.jpg.filepart


And:

cimg_cm_fig4_content.jpg_scimg_fig4_content.jpg_spimg_raime.jpg_simg_Amazing-Nature_3840x2160.jpg_pt_layer_relu2_1_sz_512_model_norm_pad_sw_2.0E+08_cw_1.0E+05_naive_scalemix.jpg


The layers used are: relu2_1 and relu4_1

Style weight is: sw_2.0E+08

Content weight is: cw_1.0E+05

The Normalized VGG-19 model is used: model_norm

Not sure what this is: ptw_1.0E+05

Naive Scale mix is the best version, and also the styled style image: naive_scalemix.jpg

Not sure if pt_layer refers to both style_layers and content_layers, or just one of them?

ProGamerGov commented 7 years ago

On the subject of Gram Matrices (Leon Gatys said this would be important for transferring features to Neural-Style):

Neural-Style is normalising the Gram Matrices differently, as it additionally divides by the number of features, when compared with Gatys' code. This means that the style loss weights for the different layers in Neural-Style and Gatys' code are a little different:

In a layer l with n_l = 64 features, a style loss weight of 1 in Neural-Style, is a style loss weight of 1/64^2 in Gatys' code.

htoyryla commented 7 years ago

"Neural-Style is normalising the Gram Matrices differently, as it additionally divides by the number of features, when compared with Gatys' code. This means that the style loss weights for the different layers in Neural-Style and Gatys' code are a little different:

In a layer l with n_l = 64 features, a style loss weight of 1 in Neural-Style, is a style loss weight of 1/64^2 in Gatys' code."

I am not familiar with Gatys's code, but what you wrote is confusing. First you say that Neural_style divides the Gram matrix by the number of features, but in your example you don't do this division.

If Gatys' normalizes by 1/C^2 where C is the number of features, it makes sense to me as the size of the Gram matrix is CxC.

In neural_style, the gram matrix is normalized for style loss in the line https://github.com/jcjohnson/neural-style/blob/master/neural_style.lua#L534 Here, input:nElements() is not C but CxHxW, where C,H,W are the dimensions of the layer to which the Gram matrix is added, so that in practice neural-style ends up with a smaller value for the normalized style loss than 1/C^2.

Dividing instead by self.G:nElements() would implement division by C^2 so if that's what you want, try it.

I don't know if this use of input:nElement() instead of self.G:nElements() here is intentional or an accident. @jcjohnson ?

There has been an earlier discussion about this division but there was nothing on this in particular: https://github.com/jcjohnson/neural-style/issues/90

PS. I checked the corresponding code in fast-neural-style https://github.com/jcjohnson/fast-neural-style/blob/master/fast_neural_style/GramMatrix.lua#L46-L49 which also normalizes the Gram matrix by 1/(CHW), so I guess this is done on purpose. After all, normalizing by 1/C^2 would favor the lower layers too much.

htoyryla commented 7 years ago

I ran a quick test with the -reflectance option. The change is not particularly obvious at first glance, but it does appear to cause a change.

As padding only means adding a few pixels around the image I wouldn't expect large changes. Mostly this should be visible close to the edges, and indeed there appears to be a difference along the left hand side.

htoyryla commented 7 years ago

Changing line https://github.com/jcjohnson/neural-style/blob/master/neural_style.lua#L534 to divide by self.G:nElement(), I ran neural-style with defaults and got this.

outcxc

whereas with the original the resulting image was

outchw

Now, they are obviously different but as the style weight has been effectively increased, we should not read too much into this difference. Anyway, this is worth more testing and the idea of normalizing this way makes intuitively sense to me.

htoyryla commented 7 years ago

Concerning YUV... I was under the impression that Y is the luminance.

When you want to disable content_loss, why not simply set content_weight to 0?

htoyryla commented 7 years ago

It looks like the 1/C^2 style normalization favors the lowest layers which have smaller C (64 for conv1 as opposed to 512 for conv5). The original neural-style behavior 1/(CxHxW) penalizes less the higher layers because H and W decrease when going to higher layers.

ProGamerGov commented 7 years ago

When you want to disable content_loss, why not simply set content_weight to 0?

I will try that as well later today. I think my settings from before were to different from Gatys' settings.

The other issue is that I think transferring the color from a third image, might be needed, as I would imagine that Gatys' would have used something similar to -original_colors 1 if it were the better solution.

ProGamerGov commented 7 years ago

I think I figure out the style combination:

The styled style image: https://i.imgur.com/G1eZerW.png

This was used to produce the final image:

th neural_style.lua -original_colors 1 -style_weight 10000 -output_image out3.png -num_iterations 1000 -content_image fig4_content.jpg -style_image out1_200.png -image_size 512 -save_iter 0 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune

And this was used to produce the styled style image:

th neural_style_c.lua -content_weight 0 -style_weight 10000 -output_image out1.png -num_iterations 200 -content_image fig4_style3.jpg -style_image fig4_style1.jpg -image_size 2800 -content_layers relu2_1 -style_layers relu2_1 -save_iter 50 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune


I wonder if something similar could be accomplished by being able to control the layers each style image uses?


I am unable to produce a larger version like Gatys was able to do, Any larger images seem to be blurry, and the shapes begin to fade. The darkness of Seated Nude seems to make this harder as the dark areas seem to take over areas on the new style image in my experiments.

htoyryla commented 7 years ago

A note on 1/C^2 gram matrix normalization: this line also needs to be changed https://github.com/jcjohnson/neural-style/blob/master/neural_style.lua#L553 so that the backward pass too will use the normalized matrix.

This will require quite different weights, like content_weight 1e3 and style_weight 1, it can take some 300 iterations before the image starts really to develop, but to me the results look good. I am talking about plain neural_style with modifed Gram matrix normalization. Haven't really looked deeper into the Gatys project.

VaKonS commented 7 years ago

ProGamerGov, just a little suggestion: since GPU handling is already implemented in "function setup_gpu(params)" (line 324), maybe it's possible to use that function instead of new "set_datatype(data, gpu)"?

It could make the code more maintainable – in case of any changes someone will have to modify only one function instead of two.

For example: pad_layer = nn.SpatialReflectionPadding(padW, padW, padH, padH):type(dtype) (see how nn.SpatialAveragePooling(kW, kH, dW, dH):type(dtype) is added in line 136).

Currently I can not test it on GPU, but I can confirm that it does work on CPU.

ProGamerGov commented 7 years ago

@VaKonS

I'll take a look. I originally pasted in Gatys GPU handling code at the time because I couldn't get the reflection function to work with this line of code:

pad_layer = set_datatype(pad_layer, params.gpu)

As I couldn't figure out how to use function setup_gpu with the code.

Are you saying to change this line:

https://github.com/ProGamerGov/neural-style/blob/6814479c8ebcc11498b7c123ee2ba7ef9f0fe09f/neural_style.lua#L125

to this:

local pad_layer = nn.SpatialReflectionPadding(padW, padW, padH, padH):type(dtype)

And then delete this line:

pad_layer = set_datatype(pad_layer, params.gpu)

?

VaKonS commented 7 years ago

@ProGamerGov, yes. And to delete function set_datatype(data, gpu) at line 611, as it will not be needed anymore.

ProGamerGov commented 7 years ago

@VaKonS , I made a version that contains other padding types: https://gist.github.com/ProGamerGov/0e7523e221935442a6a899bdfee033a8

When using -padding, you can try 5 different types of padding: default, reflect, zero, replication, or pad. In my testing, the pad option seems to leave untouched edges on other either side of the image.

Edit: Modified version with htoyryla's suggestions: https://gist.github.com/ProGamerGov/5b9c9f133cfb14cf926ca7b580ea3cc8

The modified version only has two 3 options, default, reflect, or replicate.

htoyryla commented 7 years ago

Types 'reflect' and 'replication' make sense, although with the typical padding width = 1 as in VGG19 the result is identical.

Type 'zero' is superfluous as the convolution layer already pads with zeroes.

Type 'pad' only pads in one dimension so it hardly makes sense.

You should read nn documentation when using the nn layers. The nn.Spatial.... layers are meant to work with two-dimensional data like images. nn.Padding provides a lower level access for padding of tensors, you need to specify which dimension, which side, which value, and if one wants to use it to pad an image one needs to apply it several times with different settings.

But frankly, with the 1-pixel padding in VGG there are not so many ways to pad. We should also remember that the main reason for padding in the convolution layers is to get the correct output size. Without padding convolution tends to shrink the size.

htoyryla commented 7 years ago

The code could also be structured like this (to avoid duplicating code and making the same checks several times). Here I used 'reflect' and 'replicate' as they are shorter, you may prefer 'replication' and 'reflection' as in the layer names. But having one as a verb and the other as a noun is maybe not a good idea.

local is_convolution = (layer_type == 'cudnn.SpatialConvolution' or layer_type == 'nn.SpatialConvolution')   
if is_convolution and params.padding ~= 'default' then
    local padW, padH = layer.padW, layer.padH
    if params.padding == 'reflect' then
        local pad_layer = nn.SpatialReflectionPadding(padW, padW, padH, padH):type(dtype)
    elseif params.padding == 'replicate' then 
        local pad_layer = nn.SpatialReplicationPadding(padW, padW, padH, padH):type(dtype)
    else
        error('Unknown padding type')
   end  
   net:add(pad_layer)
   layer.padW = 0
   layer.padH = 0
end
VaKonS commented 7 years ago

@htoyryla, reflective padding probably takes pixels starting from 1 pixel distance: [ x-2, x-1, x ] [ x-1, x-2 ]. And replication duplicates the edge: [ x-2, x-1, x ] [ x, x ].

htoyryla commented 7 years ago

Yes, I just realized that when I did a small test. That explains why it made a difference also with padding of one row/column. The documentation is a bit unclear so I believed reflection would result in [ x-2, x-1, x ] [ x, x-1 ] when it only says 'reflection of the input boundary'. But obviously this is more useful.

ProGamerGov commented 7 years ago

I have been trying to get this python script to work for the linear color feature found in Gatys' code here: https://github.com/leongatys/NeuralImageSynthesis/blob/master/ExampleNotebooks/ScaleControl.ipynb

https://gist.github.com/ProGamerGov/5fc5ef9035edc9a026e41925f733a45c

The idea is that making this feature into a simple python script will be easier and less messy than implementing into neural_style.lua. But I can't figure out the python parameters so that the image is fed into the function properly.

Edit:

Trying to reverse engineer the code that feeds into the function:

https://gist.github.com/ProGamerGov/32b7d68a098f8b0655d71a08eb3ba050

So far it doesn't output the converted images.

htoyryla commented 7 years ago

About your first script https://gist.github.com/ProGamerGov/5fc5ef9035edc9a026e41925f733a45c

To make it process the images and save the result you need something like this. You did not pass the images to your function and you did not use the resulting image returned by the function. Remember that the function parameters target_img and source_img are totally separate from the variables with the same names, usually it is a good practice to avoid using the same names for both.

The numpy imports were needed, on the other hand I had to use skimage.io instead of PIL for reading and saving the image, probably they use a different format for the image inside python. Anyway, Gatys used imread() and not Image.open().

This works in principle but the resulting image is probably not what one would expect. It could be that some kind of pre/deprocessing is needed which was not obvious to me (not being familiar with the process you are trying to duplicate).

PS. imread returns an image where the data is between 0 and 255 as integers, while match_color expects 0..1 floats. Thats why the result is not good yet.

import scipy
import h5py
import skimage
import os
from skimage import io,transform,img_as_float
from skimage.io import imread,imsave
from collections import OrderedDict
#from PIL import Image, ImageFilter
import numpy as np
from numpy import eye 
import decimal
#import click

target_img = imread('to.png')
source_img = imread('from.png')

def match_color(target_img, source_img, mode='pca', eps=1e-5):
    ....
    return matched_img

output_img = match_color(target_img, source_img)
imsave('result.png', output_img)
htoyryla commented 7 years ago

OK, by still changing the two imread lines to

target_img = imread('to.png').astype(float)/256
source_img = imread('from.png').astype(float)/256

from these two images from to

I get this (don't know if this is what is expected but it looks ok)

result

htoyryla commented 7 years ago

Just noticed that there was already an import for img_as_float so these work as well

target_img = img_as_float(imread('to.png'))
source_img = img_as_float(imread('from.png'))

But anyway, I hope this illustrates that one cannot simply cut and paste code but needs also to examine it and make sure the pieces fit together.

ProGamerGov commented 7 years ago

The script seems to work like the outputs Gatys' code produced in the iPython interface now:

The source image:

The target images:

The images I used can be found in Gatys respository here, and in my Imgur album here: https://imgur.com/a/PrKtg.

Before the Gatys' Scale Control code tried to transfer the brush strokes onto the circular pattern image, it created images like these with the linear transfer color function. So I guess the next step is to test how well these modified style images work.

The working script: https://gist.github.com/ProGamerGov/73e6c242abc00777e4e8cf05cf39dc70

This code here:

target_img = img_as_float(imread('to.png'))
source_img = img_as_float(imread('from.png'))

Did not seem to work for me, though that could be a Virtualbox related issue like some ImageMagick scripts can cause.

htoyryla commented 7 years ago

If img_as_float does not work, check that you have

from skimage import io,transform,img_as_float

(Just noticed that you have it. Don't know what is going on there if you have skimage installed in your python and can import it.)

And by the way, assuming you want to try all options, you can change the match_color mode and eps like this:

output_img = match_color(target_img, source_img, mode='chol', eps=1e-4)
htoyryla commented 7 years ago

Python interpreter is useful for testing small things (just like th in lua):

Python 2.7.6 (default, Oct 26 2016, 20:30:19)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import skimage
>>> from skimage import io,transform,img_as_float
>>> from skimage.io import imread,imsave
>>> img_as_float
<function img_as_float at 0x7f3bc9cad230>
>>> img = imread('to.png')
>>> img
array([[[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [133, 119, 112],
        [101,  84,  85],
        [ 54,  45,  44]],
>>> img_as_float(img)
array([[[ 1.        ,  1.        ,  1.        ],
        [ 1.        ,  1.        ,  1.        ],
        [ 1.        ,  1.        ,  1.        ],
        ...,
        [ 0.52156863,  0.46666667,  0.43921569],
        [ 0.39607843,  0.32941176,  0.33333333],
        [ 0.21176471,  0.17647059,  0.17254902]],
ProGamerGov commented 7 years ago

I got the script to accept user specified parameters: https://gist.github.com/ProGamerGov/d0917848a728bceb4131272734f61e8b

Only the target and source image are required, but you can also control the eps value and the transfer mode. Though the --eps parameter currently only accepts values in scientific notation.

I also cleaned up the unused lines of code.

I currently testing different parameters for scale control.

htoyryla commented 7 years ago

It seems you don't understand how functions work. When one defines a function like match_color one specifies the parameters that are input to the function when it is called.

When one calls the function one gives the actual values for those parameters. One can then call the function as many times as needed with different values.

What you are doing now is defining a function so that the default values of transfer_mode and eps are defined from user input. It works when you only run the function once but it is confusing. That is not the way to pass values into a function.

You should change the def line as it was and add the actual values of transfer_mode and eps to the line where the function is called (like I already suggested).

output_img = match_color(target_img, source_img, mode=transfer_mode, eps=int(float(eps_value)))

BTW, I don't understand the int() for eps... first we give something like 1e-5, then float it and finally int which gives 0. So you limit eps to integer values only? Why the int? Float(eps_value) should be enough to convert the input string into a number.

ProGamerGov commented 7 years ago

It seems you don't understand how functions work.

It works when you only run the function once but it is confusing. That is not the way to pass values into a function.

I went for making the code work, without putting a lot of focus on how. Which is a terrible way to go about coding.

You should change the def line as it was and add the actual values of transfer_mode and eps to the line where the function is called (like I already suggested).

Yea, I see that now. Not sure what I was thinking at time when I made such an embarrassing and obvious mistake.

BTW, I don't understand the int() for eps... first we give something like 1e-5, then float it and finally int which gives 0. So you limit eps to integer values only? Why the int?

It was the first that worked, which I now think was because I fixed a bracket placement error. I have removed the integer limitation.

Thanks for helping me correct the issues!

ProGamerGov commented 7 years ago

I think I am getting close to the research paper's results:

Layers relu2_1,relu4_2:

Direct link to full image: https://i.imgur.com/Vo9p96O.png

I know the research paper talks about only using layers relu1_1 and relu2_1, but the fine brush strokes from the paint style image seem to work best with relu2_1 and relu4_2, or just relu4_2, at least with this coarse style image. I'm not sure if I am missing something, or if this is due to a different between Gatys' and JcJohnson's code?

This was my content image: https://i.imgur.com/eoX7f3I.jpg

Control test without scale control:

Screenshot from the research paper:


I used this command to create my "stylemix" image:

 th neural_style.lua -tv_weight 0 -content_weight 0 -style_weight 10000 -output_image out5.png -num_iterations 550 -content_image result.png -style_image result_3.png -image_size 1536 -content_layers relu2_1,relu4_2 -style_layers relu2_1,relu4_2 -save_iter 50 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune

Then I used this two step set of commands to create the final output:

th neural_style.lua -style_weight 10000 -output_image out_final.png -num_iterations 1000 -content_image fig4_content.jpg -style_image out5_pca.png -image_size 512 -save_iter 0 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune

th neural_style.lua -style_weight 10000 -output_image out_final_hr.png -num_iterations 550 -content_image fig4_content.jpg -init_image out_final.png -style_image out5_pca.png -image_size 1536 -save_iter 0 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune

I used the default linear-color-transfer.py script on my stylemix image before using it to create my final output, so the colors are more vivid than Gatys' version in the research paper. The default linear-color-transfer.py script was also used on both style images before I added the fine style to the coarse style. Both times used the final content image with the city lights, as the source image.

htoyryla commented 7 years ago

Can you give the commands how to run the whole process, would like to test.

ProGamerGov commented 7 years ago

@htoyryla


Images used:

fig4_content.jpg: https://github.com/leongatys/NeuralImageSynthesis/blob/master/Images/ControlPaper/fig4_content.jpg

Fine style: https://github.com/leongatys/NeuralImageSynthesis/blob/master/Images/ControlPaper/fig4_style1.jpg

Course style: https://github.com/leongatys/NeuralImageSynthesis/blob/master/Images/ControlPaper/fig4_style2.jpg


Step 1:

python linear-color-transfer.py --target_image coarse_style.png --source_image fig4_content.jpg --output_image coarse_pca.png

python linear-color-transfer.py --target_image fine_style.png --source_image fig4_content.jpg --output_image fine_pca.png

Step 2 (Gatys called the output from this step, "stylemix", but I used a generic name from a the list of experiments I was running):

th neural_style.lua -tv_weight 0 -content_weight 0 -style_weight 10000 -output_image out5.png -num_iterations 550 -content_image coarse_pca.png -style_image fine_pca.png -image_size 1536 -content_layers relu2_1,relu4_2 -style_layers relu2_1,relu4_2 -save_iter 50 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune

Step 2.5 (I don't think Gatys' code does this, but I thought it would make the colors look better):

python linear-color-transfer.py --target_image out5.png --source_image fig4_content.jpg --output_image out5_pca.png

Step 3:

Then I tried to mimic Gaty's two step process where the first image is generated at 512px:

th neural_style.lua -style_weight 10000 -output_image out_final.png -num_iterations 1000 -content_image fig4_content.jpg -style_image out5_pca.png -image_size 512 -save_iter 0 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune

th neural_style.lua -style_weight 10000 -output_image out_final_hr.png -num_iterations 550 -content_image fig4_content.jpg -init_image out_final.png -style_image out5_pca.png -image_size 1536 -save_iter 0 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune

Those commands in that order should give you the exact same output as I got.

VaKonS commented 7 years ago

After making more tests with different models, I was wrong: the noise is not added by padding. It's a quality of some models: vgg19 from crowsonkb's repository makes clean images with or without padding, and images made with Illustration2Vec, for example, have noisy borders even with default padding.

noise

ProGamerGov commented 7 years ago

Examining the outputs produced by ScaleControl.ipynb:


Gatys' Scale Control code produce 3 different outputs, each follows a two step Multires process. I am not sure if these are 3 different ways of doing Scale Control, or if 1 or 2 of them are meant to showcase ways that don't work?

For each of the 3 options in the iPython script, I ran the code and generated the images. Each produced a low resolution 648x405 image and then a 1296x810 "hr" resolution image. Though the image names say that the first image has a resolution of 512px and the second image has a resolution of 1024px, which means there may be some else going on here (maybe downsampling?). I have included both images for each example, and they can be viewed in full in the Imgur link below each example.

Gatys' iPython code names the images with the parameters used to create them, and as such I have included the image file names.

"Stylemix images" are what Gatys calls the resulting combination style image made of both the coarse and fine style images.


Combine 2 images with fine and coarse scale:

low res and hr res: https://imgur.com/a/D7AcK

low res:

cimg_cm_fig4_content.jpg_scimg_fig4_content.jpg_spimg_fig4_style2.jpg_simg_fig4_style3.jpg_pt_layer_relu2_1_sz_512_model_org_pad_sw_1.0E+03_cw_1.0E+00.jpg

hr:

cimg_cm_fig4_content.jpg_scimg_fig4_content.jpg_spimg_fig4_style2.jpg_simg_fig4_style3.jpg_pt_layer_relu2_1_hrpt_layer_relu4_1_sz_512_hrsz_1024_model_org_pad_sw_1.0E+03_cw_1.0E+00.jpg

iPython terminal output: https://gist.github.com/ProGamerGov/a613c42514b9059ebc8230d2c1cd0fd1

norm net:

low res and hr res: https://imgur.com/a/oTB1k

low res:

cimg_cm_fig4_content.jpg_scimg_fig4_content.jpg_spimg_fig4_style2.jpg_simg_fig4_style3.jpg_pt_layer_relu2_1_sz_512_model_norm_pad_sw_2.0E+08_cw_1.0E+05.jpg

hr res:

cimg_cm_fig4_content.jpg_scimg_fig4_content.jpg_spimg_fig4_style2.jpg_simg_fig4_style3.jpg_pt_layer_relu2_1_hrpt_layer_relu4_1_sz_512_hrsz_1024_model_norm_pad_sw_2.0E+08_cw_1.0E+05.jpg

iPython terminal output: https://gist.github.com/ProGamerGov/3d8f8ffdbde5f8ec69c46f3076fa3f2d

Naive scale combination:

low res and hr res: https://imgur.com/a/LbqJQ

low res:

cimg_cm_fig4_content.jpg_scimg_fig4_content.jpg_spimg_fig4_style2.jpg_simg_fig4_style3.jpg_pt_layer_relu2_1_sz_512_model_norm_pad_sw_2.0E+08_cw_1.0E+05_naive_scalemix.jpg

hr res:

cimg_cm_fig4_content.jpg_scimg_fig4_content.jpg_spimg_fig4_style2.jpg_simg_fig4_style3.jpg_pt_layer_relu2_1_sz_512_hrsz_1024_model_norm_pad_sw_2.0E+08_cw_1.0E+05_naive_scalemix.jpg

iPython terminal output: https://gist.github.com/ProGamerGov/71eda3b16793835bbe142d902c480fe7


The code in addition to creating the two images for each example, also created 4 stylemix images:

Stylemix image: https://i.imgur.com/m7nRgKP.jpg

Name:

scimg_fig4_content.jpg_spimg_fig4_style2.jpg_simg_fig4_style3.jpg_hrpt_layer_relu4_1_hrsz_1024_model_norm_pad_ptw_1.0E+05.jpg

Stylemix image: https://i.imgur.com/XPt6N52.jpg

Name:

scimg_fig4_content.jpg_spimg_fig4_style2.jpg_simg_fig4_style3.jpg_pt_layer_relu2_1_sz_512_model_norm_pad_ptw_1.0E+05

Stylemix image: https://i.imgur.com/Vf3mg2n.jpg

Name:

spimg_fig4_style2.jpg_simg_fig4_style3.jpg_hrpt_layer_relu4_1_hrsz_1024_model_org_pad_ptw_1.0E+03.jpg

Stylemix image: https://i.imgur.com/c1ZNDoZ.jpg

Name:

spimg_fig4_style2.jpg_simg_fig4_style3.jpg_pt_layer_relu2_1_sz_512_model_org_pad_ptw_1.0E+03.jpg

I am not sure why there are 3 Examples of Scale Control, and 4 stylemix images. But I assume one of the examples must use 2 stylemix images?

ProGamerGov commented 7 years ago

Ok, so trying both models from Gatys repository which are the normalized VGG19, and the VGG-19 Conv model, I can't seem to get the parameters right. Up until now I was using the default VGG-19 model.

wget -c --no-check-certificate https://bethgelab.org/media/uploads/deeptextures/vgg_normalised.caffemodel
wget -c --no-check-certificate https://bethgelab.org/media/uploads/stylecontrol/VGG_ILSVRC_19_layers_conv.caffemodel

I assume the default Neural-Style VGG-19 prototxt may not work with these models?

wget -c https://gist.githubusercontent.com/ksimonyan/3785162f95cd2d5fee77/raw/bb2b4fe0a9bb0669211cf3d0bc949dfdda173e9e/VGG_ILSVRC_19_layers_deploy.prototxt

Edit: It seems that the models are special versions created by Leon Gatys: https://github.com/jcjohnson/neural-style/issues/7

I don't know why, but I can't seem to get either model to work.

ProGamerGov commented 7 years ago

Using Gatys' weights for Scale Control in Neural-Style seems to work pretty well:

Also, the sym option on the match_color function is for luminescence style transfer.

VaKonS commented 7 years ago

@ProGamerGov, by the way, the code can probably be reimplemented in Torch and maybe even will not be too different:

Python Torch
ndarray.shape size
numpy.mean torch.mean
numpy.linalg.cholesky torch.potrf
numpy.linalg.eigh torch.symeig
numpy.eye torch.eye
numpy.dot torch.dot
numpy.transpose transpose / permute
.T t()
numpy.reshape torch.reshape
numpy.linalg.inv torch.inverse
numpy.diag torch.diag
numpy.sqrt torch.sqrt
ProGamerGov commented 7 years ago

@VaKonS Thanks, I'll take a look at that.


@htoyryla I have started trying to extract the python code responsible for luminescence style transfer: https://gist.github.com/ProGamerGov/08c5d25bb867e4313821a45b2e3b2978

As I understand it, the research paper basically describes converting your content/style images to LUV or YIQ, before running them through the style transfer network. In his python code, Gatys appears to use LUV, so I'll start with that.

Testing those 3 functions:

rgb2luv creates this:

luv2rgb creates this:

lum_transform results in this error whenever I try to use it:

ubuntu@ip-Address:~/neural-style$ python lum2.py --input_image fig4_content.jpg
Traceback (most recent call last):
  File "lum2.py", line 47, in <module>
    output_img = lum_transform(input_img)
  File "lum2.py", line 32, in lum_transform
    img = tile(lum[None,:],(3,1)).reshape((3,image.shape[0],image.shape[1]))
NameError: global name 'tile' is not defined
ubuntu@ip-Address:~/neural-style$

I don't know what "tile" is from and I can't figure out whether it belongs to a package, or is related to another specific variable.

Edit: "tile" is part of numpy, and just like "eye" from linear-color-transfer.py, it needs a "np." appended to it's front.

lum_transform creates this:

These functions come from Gatys' Color Control code here: https://github.com/leongatys/NeuralImageSynthesis/blob/master/ExampleNotebooks/ColourControl.ipynb

I am not sure if lum_transform is needed to perform the LUV change to and from for style transfer.

htoyryla commented 7 years ago

Edit: "tile" is part of numpy, and just like "eye" from linear-color-transfer.py, it needs a "np." appended to it's front.

Good you found it. I just woke up and was about to comment :)