Open ProGamerGov opened 7 years ago
Ok, I think I have gotten the new -reflectance parameter working, though I don't know what it does: https://github.com/ProGamerGov/neural-style/blob/master/neural_style.lua
Though it seems to alter the output.
Multires without -reflectance
: https://i.imgur.com/LvpXgaW.png
Multires with -reflectance
: https://i.imgur.com/YIiqsOx.png
The -reflectance
command increases the GPU usage.
Content image: https://i.imgur.com/sgLtFDi.png
Style image: https://i.imgur.com/PsXIJLM.jpg
It seems to me that your code inserts the new padding layer after the convolution layer which already has done padding, so that padding is done twice (first with zeroes in nn.SpatialConvolution and the by reflection in nn.SpatialReflectionPadding). It is like first adding an empty border and the another one which acts as if a mirror. It would seem to me that the mirror then only reflects the empty border that was added first.
If you look closely at Gatys' code in https://github.com/leongatys/NeuralImageSynthesis/blob/master/ImageSynthesis.lua#L85-L94 you'll notice that the new padding layer is inserted first, and then the convolution layer without padding.
Your code also increases the size of the layer output, as padding is done twice, which might give size mismatch errors.
In my previous comment, I overlooked the fact that it is possible to change the layer parameters after the layer has been added to the model. Thus the lines https://github.com/ProGamerGov/neural-style/blob/master/neural_style.lua#L140-L141 in fact remove the padding from the already inserted convolution layer, so the double padding does not happen and the size of the output is not changed.
Thus the main difference between your code and Gatys' is that you do padding after the convolution, while the normal practice is to do padding before convolution.
@htoyryla
Thus the main difference between your code and Gatys' is that you do padding after the convolution, while the normal practice is to do padding before convolution.
So the reflectance padding works correctly, though I have placed it in the wrong location?
This code here is the convolution: https://github.com/ProGamerGov/neural-style/blob/master/neural_style.lua#L131-L142 ?
And for implementing the masks, Gatys' implementation uses hdf5 files, though Neural-Style does not:
cmd:option('-mask_file', 'path/to/HDF5file', 'Spatial mask to constrain the gradient descent to specific region')
-- Load mask if specified
local mask = nil
if params.mask_file ~= 'path/to/HDF5file' then
local f = hdf5.open(params.mask_file, 'r')
mask = f:all()['mask']
f:close()
mask = set_datatype(mask, params.gpu)
end
I have been trying to figure out how to modify the above code for Neural-Style masks, but non of my attempts to replace the hdf5 requirement have worked thus far. Any ideas?
The code you now linked looks better, now the padding is inserted (line #127) before the convolution (line #141). Most of what you have highlighted is NOT the convolution but related to selecting between max and avg pooling. But if you follow the if logic, if the layer is convolution it will be inserted to the model in line 141 of your present code.
I cannot guarantee that it now works but now the padding and convolution come in the correct order.
"I have been trying to figure out how to modify the above code for Neural-Style masks, but non of my attempts to replace the hdf5 requirement have worked thus far. Any ideas?"
The code you cited does not implement any mask functionality, it only loads a mask from an existing hdf5 file.
I ran a quick test with the -reflectance
option. The change is not particularly obvious at first glance, but it does appear to cause a change. More testing, and different parameter combinations could be needed to farther understand it's affect on artistic outputs.
On the left is the control test with -reflectance false
, and on the right is -reflectance true
:
Direct link to the comparison: https://i.imgur.com/YGCOCiu.png
False: https://i.imgur.com/0oQNsxl.png
True: https://i.imgur.com/a7fQTLb.png
Command used:
th neural_style.lua -seed 876 -reflectance -num_iterations 1500 -init image -image_size 640 -print_iter 50 -save_iter 50 -content_image examples/inputs/hoovertowernight.jpg -style_image examples/inputs/starry_night.jpg -backend cudnn -cudnn_autotune
Are Gatys' Grad related functions different that Neural-Styles? I'm looking for where the style masks come into play. Or should I be looking at different functions for implementing these features like masks?
From what I can see, luminescence style transfer requires the LUV color space, which unlike YUV, it has no easy to use function in the image
library.
Style masks seem to require a modifying deeper levels of the Neural-Style code.
For the independent style_scale control with multiple style images, it seems like we only need a way to disable content loss:
From the research paper:
And then a simple sh script similar to multires.sh should do the trick. That runs your style images through Neural-Style first should do the trick, but such a script needs a way to disable content loss.
I am thinking that a parameter like:
cmd:option('-content_loss', true, 'if set to false, content loss will be disabled')
if params.reflectance then
content loss code
end
@htoyryla Which part of the content loss code should this be implemented on to achieve the desired effect?
https://github.com/ProGamerGov/neural-style/blob/master/neural_style.lua#L461-L497
Or: https://github.com/ProGamerGov/neural-style/blob/master/neural_style.lua#L109
Edit: I figured it out and now the content loss module can be disabled.
Currently testing different parameters alongside the new -content_loss
parameter: https://gist.github.com/ProGamerGov/7f3d2b6656e02a7a4a23071bd0999b31
I edited this part of the neural_style.lua script: https://gist.github.com/ProGamerGov/7f3d2b6656e02a7a4a23071bd0999b31#file-neural_style-lua-L148-L151
Though I think that I need to find a way to transfer the color from the intended content image, to this first Neural-Style run with the two style images. Seeing as -init image
includes, content as well, maybe I need to add another new parameter, or maybe using -original_color 1
on step two will solve this problem?
Second Edit:
It seems that -content_layers relu1_1,relu2_1
and the default style layers work the best, Though the research paper only specified layers relu1_1 and relu2_1, not whether you should use those values for content or style layers.
I must be missing something when trying to replicate the "Naive scale combination" from here: https://github.com/leongatys/NeuralImageSynthesis/blob/master/ExampleNotebooks/ScaleControl.ipynb
Following the steps on the research paper:
Should result in something like this output that I made running Gatys' iPython code: https://i.imgur.com/boz8PhW.jpg
And the styled style image from his code: https://i.imgur.com/6xEumk0.jpg
But instead I get this:
The styled style image: https://i.imgur.com/30HUeOH.png
And here is the final output: https://i.imgur.com/SWhzMn0.png
I tried this code to create the styled style image: https://gist.github.com/ProGamerGov/53979447d09fe6098d4b00fc8e924109
And then ran:
th neural_style_c.lua -original_colors 1 -output_image out.png -num_iterations 1000 -content_image fig4_content.jpg -style_image out7.png -image_size 640 -save_iter 50 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune
The final content image: https://raw.githubusercontent.com/leongatys/NeuralImageSynthesis/master/Images/ControlPaper/fig4_content.jpg
The two style images:
What am I doing wrong here?
Ok, so analyzing the styled style image from Gatys' code:
The outputs have the parameters used, and the values used, in the name:
[scimg_fig4_content.jpg_spimg_fig4_style2.jpg_simg_fig4_style3.jpg_hrpt_layer_relu4_1_hrsz_1024_model_norm_pad_ptw_1.0E+05]
I think was used to make this: https://i.imgur.com/6xEumk0.jpg
From another experiment using his code:
cimg_cm_fig4_content.jpg_scimg_fig4_content.jpg_spimg_Amazing-Nature_3840x2160.jpg_simg_raime.jpg_pt_layer_relu2_1_sz_512_model_norm_pad_sw_2.0E+08_cw_1.0E+05_naive_scalemix.jpg
The enlarged version (I think 1 step multires?):
cimg_cm_fig4_content.jpg_scimg_fig4_content.jpg_spimg_raime.jpg_simg_Amazing-Nature_3840x2160.jpg_pt_layer_relu2_1_sz_512_hrsz_1024_model_norm_pad_sw_2.0E+08_cw_1.0E+05_naive_scalemix.jpg.filepart
And:
cimg_cm_fig4_content.jpg_scimg_fig4_content.jpg_spimg_raime.jpg_simg_Amazing-Nature_3840x2160.jpg_pt_layer_relu2_1_sz_512_model_norm_pad_sw_2.0E+08_cw_1.0E+05_naive_scalemix.jpg
The layers used are: relu2_1
and relu4_1
Style weight is: sw_2.0E+08
Content weight is: cw_1.0E+05
The Normalized VGG-19 model is used: model_norm
Not sure what this is: ptw_1.0E+05
Naive Scale mix is the best version, and also the styled style image: naive_scalemix.jpg
Not sure if pt_layer
refers to both style_layers
and content_layers
, or just one of them?
On the subject of Gram Matrices (Leon Gatys said this would be important for transferring features to Neural-Style):
Neural-Style is normalising the Gram Matrices differently, as it additionally divides by the number of features, when compared with Gatys' code. This means that the style loss weights for the different layers in Neural-Style and Gatys' code are a little different:
In a layer l with n_l = 64 features, a style loss weight of 1 in Neural-Style, is a style loss weight of 1/64^2 in Gatys' code.
"Neural-Style is normalising the Gram Matrices differently, as it additionally divides by the number of features, when compared with Gatys' code. This means that the style loss weights for the different layers in Neural-Style and Gatys' code are a little different:
In a layer l with n_l = 64 features, a style loss weight of 1 in Neural-Style, is a style loss weight of 1/64^2 in Gatys' code."
I am not familiar with Gatys's code, but what you wrote is confusing. First you say that Neural_style divides the Gram matrix by the number of features, but in your example you don't do this division.
If Gatys' normalizes by 1/C^2 where C is the number of features, it makes sense to me as the size of the Gram matrix is CxC.
In neural_style, the gram matrix is normalized for style loss in the line https://github.com/jcjohnson/neural-style/blob/master/neural_style.lua#L534 Here, input:nElements() is not C but CxHxW, where C,H,W are the dimensions of the layer to which the Gram matrix is added, so that in practice neural-style ends up with a smaller value for the normalized style loss than 1/C^2.
Dividing instead by self.G:nElements() would implement division by C^2 so if that's what you want, try it.
I don't know if this use of input:nElement() instead of self.G:nElements() here is intentional or an accident. @jcjohnson ?
There has been an earlier discussion about this division but there was nothing on this in particular: https://github.com/jcjohnson/neural-style/issues/90
PS. I checked the corresponding code in fast-neural-style https://github.com/jcjohnson/fast-neural-style/blob/master/fast_neural_style/GramMatrix.lua#L46-L49 which also normalizes the Gram matrix by 1/(CHW), so I guess this is done on purpose. After all, normalizing by 1/C^2 would favor the lower layers too much.
I ran a quick test with the -reflectance option. The change is not particularly obvious at first glance, but it does appear to cause a change.
As padding only means adding a few pixels around the image I wouldn't expect large changes. Mostly this should be visible close to the edges, and indeed there appears to be a difference along the left hand side.
Changing line https://github.com/jcjohnson/neural-style/blob/master/neural_style.lua#L534 to divide by self.G:nElement(), I ran neural-style with defaults and got this.
whereas with the original the resulting image was
Now, they are obviously different but as the style weight has been effectively increased, we should not read too much into this difference. Anyway, this is worth more testing and the idea of normalizing this way makes intuitively sense to me.
Concerning YUV... I was under the impression that Y is the luminance.
When you want to disable content_loss, why not simply set content_weight to 0?
It looks like the 1/C^2 style normalization favors the lowest layers which have smaller C (64 for conv1 as opposed to 512 for conv5). The original neural-style behavior 1/(CxHxW) penalizes less the higher layers because H and W decrease when going to higher layers.
When you want to disable content_loss, why not simply set content_weight to 0?
I will try that as well later today. I think my settings from before were to different from Gatys' settings.
The other issue is that I think transferring the color from a third image, might be needed, as I would imagine that Gatys' would have used something similar to -original_colors 1
if it were the better solution.
I think I figure out the style combination:
The styled style image: https://i.imgur.com/G1eZerW.png
This was used to produce the final image:
th neural_style.lua -original_colors 1 -style_weight 10000 -output_image out3.png -num_iterations 1000 -content_image fig4_content.jpg -style_image out1_200.png -image_size 512 -save_iter 0 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune
And this was used to produce the styled style image:
th neural_style_c.lua -content_weight 0 -style_weight 10000 -output_image out1.png -num_iterations 200 -content_image fig4_style3.jpg -style_image fig4_style1.jpg -image_size 2800 -content_layers relu2_1 -style_layers relu2_1 -save_iter 50 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune
I wonder if something similar could be accomplished by being able to control the layers each style image uses?
I am unable to produce a larger version like Gatys was able to do, Any larger images seem to be blurry, and the shapes begin to fade. The darkness of Seated Nude seems to make this harder as the dark areas seem to take over areas on the new style image in my experiments.
A note on 1/C^2 gram matrix normalization: this line also needs to be changed https://github.com/jcjohnson/neural-style/blob/master/neural_style.lua#L553 so that the backward pass too will use the normalized matrix.
This will require quite different weights, like content_weight 1e3 and style_weight 1, it can take some 300 iterations before the image starts really to develop, but to me the results look good. I am talking about plain neural_style with modifed Gram matrix normalization. Haven't really looked deeper into the Gatys project.
ProGamerGov, just a little suggestion: since GPU handling is already implemented in "function setup_gpu(params)" (line 324), maybe it's possible to use that function instead of new "set_datatype(data, gpu)
"?
It could make the code more maintainable – in case of any changes someone will have to modify only one function instead of two.
For example: pad_layer = nn.SpatialReflectionPadding(padW, padW, padH, padH):type(dtype)
(see how nn.SpatialAveragePooling(kW, kH, dW, dH):type(dtype) is added in line 136).
Currently I can not test it on GPU, but I can confirm that it does work on CPU.
@VaKonS
I'll take a look. I originally pasted in Gatys GPU handling code at the time because I couldn't get the reflection function to work with this line of code:
pad_layer = set_datatype(pad_layer, params.gpu)
As I couldn't figure out how to use function setup_gpu
with the code.
Are you saying to change this line:
to this:
local pad_layer = nn.SpatialReflectionPadding(padW, padW, padH, padH):type(dtype)
And then delete this line:
pad_layer = set_datatype(pad_layer, params.gpu)
?
@ProGamerGov, yes.
And to delete function set_datatype(data, gpu)
at line 611, as it will not be needed anymore.
@VaKonS , I made a version that contains other padding types: https://gist.github.com/ProGamerGov/0e7523e221935442a6a899bdfee033a8
When using -padding
, you can try 5 different types of padding: default
, reflect
, zero
, replication
, or pad
. In my testing, the pad
option seems to leave untouched edges on other either side of the image.
Edit: Modified version with htoyryla's suggestions: https://gist.github.com/ProGamerGov/5b9c9f133cfb14cf926ca7b580ea3cc8
The modified version only has two 3 options, default
, reflect
, or replicate
.
Types 'reflect' and 'replication' make sense, although with the typical padding width = 1 as in VGG19 the result is identical.
Type 'zero' is superfluous as the convolution layer already pads with zeroes.
Type 'pad' only pads in one dimension so it hardly makes sense.
You should read nn documentation when using the nn layers. The nn.Spatial.... layers are meant to work with two-dimensional data like images. nn.Padding provides a lower level access for padding of tensors, you need to specify which dimension, which side, which value, and if one wants to use it to pad an image one needs to apply it several times with different settings.
But frankly, with the 1-pixel padding in VGG there are not so many ways to pad. We should also remember that the main reason for padding in the convolution layers is to get the correct output size. Without padding convolution tends to shrink the size.
The code could also be structured like this (to avoid duplicating code and making the same checks several times). Here I used 'reflect' and 'replicate' as they are shorter, you may prefer 'replication' and 'reflection' as in the layer names. But having one as a verb and the other as a noun is maybe not a good idea.
local is_convolution = (layer_type == 'cudnn.SpatialConvolution' or layer_type == 'nn.SpatialConvolution')
if is_convolution and params.padding ~= 'default' then
local padW, padH = layer.padW, layer.padH
if params.padding == 'reflect' then
local pad_layer = nn.SpatialReflectionPadding(padW, padW, padH, padH):type(dtype)
elseif params.padding == 'replicate' then
local pad_layer = nn.SpatialReplicationPadding(padW, padW, padH, padH):type(dtype)
else
error('Unknown padding type')
end
net:add(pad_layer)
layer.padW = 0
layer.padH = 0
end
@htoyryla, reflective padding probably takes pixels starting from 1 pixel distance: [ x-2, x-1, x ] [ x-1, x-2 ]. And replication duplicates the edge: [ x-2, x-1, x ] [ x, x ].
Yes, I just realized that when I did a small test. That explains why it made a difference also with padding of one row/column. The documentation is a bit unclear so I believed reflection would result in [ x-2, x-1, x ] [ x, x-1 ] when it only says 'reflection of the input boundary'. But obviously this is more useful.
I have been trying to get this python script to work for the linear color feature found in Gatys' code here: https://github.com/leongatys/NeuralImageSynthesis/blob/master/ExampleNotebooks/ScaleControl.ipynb
https://gist.github.com/ProGamerGov/5fc5ef9035edc9a026e41925f733a45c
The idea is that making this feature into a simple python script will be easier and less messy than implementing into neural_style.lua. But I can't figure out the python parameters so that the image is fed into the function properly.
Edit:
Trying to reverse engineer the code that feeds into the function:
https://gist.github.com/ProGamerGov/32b7d68a098f8b0655d71a08eb3ba050
So far it doesn't output the converted images.
About your first script https://gist.github.com/ProGamerGov/5fc5ef9035edc9a026e41925f733a45c
To make it process the images and save the result you need something like this. You did not pass the images to your function and you did not use the resulting image returned by the function. Remember that the function parameters target_img and source_img are totally separate from the variables with the same names, usually it is a good practice to avoid using the same names for both.
The numpy imports were needed, on the other hand I had to use skimage.io instead of PIL for reading and saving the image, probably they use a different format for the image inside python. Anyway, Gatys used imread() and not Image.open().
This works in principle but the resulting image is probably not what one would expect. It could be that some kind of pre/deprocessing is needed which was not obvious to me (not being familiar with the process you are trying to duplicate).
PS. imread returns an image where the data is between 0 and 255 as integers, while match_color expects 0..1 floats. Thats why the result is not good yet.
import scipy
import h5py
import skimage
import os
from skimage import io,transform,img_as_float
from skimage.io import imread,imsave
from collections import OrderedDict
#from PIL import Image, ImageFilter
import numpy as np
from numpy import eye
import decimal
#import click
target_img = imread('to.png')
source_img = imread('from.png')
def match_color(target_img, source_img, mode='pca', eps=1e-5):
....
return matched_img
output_img = match_color(target_img, source_img)
imsave('result.png', output_img)
OK, by still changing the two imread lines to
target_img = imread('to.png').astype(float)/256
source_img = imread('from.png').astype(float)/256
from these two images
I get this (don't know if this is what is expected but it looks ok)
Just noticed that there was already an import for img_as_float so these work as well
target_img = img_as_float(imread('to.png'))
source_img = img_as_float(imread('from.png'))
But anyway, I hope this illustrates that one cannot simply cut and paste code but needs also to examine it and make sure the pieces fit together.
The script seems to work like the outputs Gatys' code produced in the iPython interface now:
The source image:
The target images:
The images I used can be found in Gatys respository here, and in my Imgur album here: https://imgur.com/a/PrKtg.
Before the Gatys' Scale Control code tried to transfer the brush strokes onto the circular pattern image, it created images like these with the linear transfer color function. So I guess the next step is to test how well these modified style images work.
The working script: https://gist.github.com/ProGamerGov/73e6c242abc00777e4e8cf05cf39dc70
This code here:
target_img = img_as_float(imread('to.png'))
source_img = img_as_float(imread('from.png'))
Did not seem to work for me, though that could be a Virtualbox related issue like some ImageMagick scripts can cause.
If img_as_float does not work, check that you have
from skimage import io,transform,img_as_float
(Just noticed that you have it. Don't know what is going on there if you have skimage installed in your python and can import it.)
And by the way, assuming you want to try all options, you can change the match_color mode and eps like this:
output_img = match_color(target_img, source_img, mode='chol', eps=1e-4)
Python interpreter is useful for testing small things (just like th in lua):
Python 2.7.6 (default, Oct 26 2016, 20:30:19)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import skimage
>>> from skimage import io,transform,img_as_float
>>> from skimage.io import imread,imsave
>>> img_as_float
<function img_as_float at 0x7f3bc9cad230>
>>> img = imread('to.png')
>>> img
array([[[255, 255, 255],
[255, 255, 255],
[255, 255, 255],
...,
[133, 119, 112],
[101, 84, 85],
[ 54, 45, 44]],
>>> img_as_float(img)
array([[[ 1. , 1. , 1. ],
[ 1. , 1. , 1. ],
[ 1. , 1. , 1. ],
...,
[ 0.52156863, 0.46666667, 0.43921569],
[ 0.39607843, 0.32941176, 0.33333333],
[ 0.21176471, 0.17647059, 0.17254902]],
I got the script to accept user specified parameters: https://gist.github.com/ProGamerGov/d0917848a728bceb4131272734f61e8b
Only the target and source image are required, but you can also control the eps value and the transfer mode. Though the --eps
parameter currently only accepts values in scientific notation.
I also cleaned up the unused lines of code.
I currently testing different parameters for scale control.
It seems you don't understand how functions work. When one defines a function like match_color one specifies the parameters that are input to the function when it is called.
When one calls the function one gives the actual values for those parameters. One can then call the function as many times as needed with different values.
What you are doing now is defining a function so that the default values of transfer_mode and eps are defined from user input. It works when you only run the function once but it is confusing. That is not the way to pass values into a function.
You should change the def line as it was and add the actual values of transfer_mode and eps to the line where the function is called (like I already suggested).
output_img = match_color(target_img, source_img, mode=transfer_mode, eps=int(float(eps_value)))
BTW, I don't understand the int() for eps... first we give something like 1e-5, then float it and finally int which gives 0. So you limit eps to integer values only? Why the int? Float(eps_value) should be enough to convert the input string into a number.
It seems you don't understand how functions work.
It works when you only run the function once but it is confusing. That is not the way to pass values into a function.
I went for making the code work, without putting a lot of focus on how. Which is a terrible way to go about coding.
You should change the def line as it was and add the actual values of transfer_mode and eps to the line where the function is called (like I already suggested).
Yea, I see that now. Not sure what I was thinking at time when I made such an embarrassing and obvious mistake.
BTW, I don't understand the int() for eps... first we give something like 1e-5, then float it and finally int which gives 0. So you limit eps to integer values only? Why the int?
It was the first that worked, which I now think was because I fixed a bracket placement error. I have removed the integer limitation.
Thanks for helping me correct the issues!
I think I am getting close to the research paper's results:
Layers relu2_1,relu4_2
:
Direct link to full image: https://i.imgur.com/Vo9p96O.png
I know the research paper talks about only using layers relu1_1
and relu2_1
, but the fine brush strokes from the paint style image seem to work best with relu2_1
and relu4_2
, or just relu4_2
, at least with this coarse style image. I'm not sure if I am missing something, or if this is due to a different between Gatys' and JcJohnson's code?
This was my content image: https://i.imgur.com/eoX7f3I.jpg
Control test without scale control:
Screenshot from the research paper:
I used this command to create my "stylemix" image:
th neural_style.lua -tv_weight 0 -content_weight 0 -style_weight 10000 -output_image out5.png -num_iterations 550 -content_image result.png -style_image result_3.png -image_size 1536 -content_layers relu2_1,relu4_2 -style_layers relu2_1,relu4_2 -save_iter 50 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune
Then I used this two step set of commands to create the final output:
th neural_style.lua -style_weight 10000 -output_image out_final.png -num_iterations 1000 -content_image fig4_content.jpg -style_image out5_pca.png -image_size 512 -save_iter 0 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune
th neural_style.lua -style_weight 10000 -output_image out_final_hr.png -num_iterations 550 -content_image fig4_content.jpg -init_image out_final.png -style_image out5_pca.png -image_size 1536 -save_iter 0 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune
I used the default linear-color-transfer.py script on my stylemix image before using it to create my final output, so the colors are more vivid than Gatys' version in the research paper. The default linear-color-transfer.py script was also used on both style images before I added the fine style to the coarse style. Both times used the final content image with the city lights, as the source image.
Can you give the commands how to run the whole process, would like to test.
@htoyryla
Images used:
fig4_content.jpg: https://github.com/leongatys/NeuralImageSynthesis/blob/master/Images/ControlPaper/fig4_content.jpg
Fine style: https://github.com/leongatys/NeuralImageSynthesis/blob/master/Images/ControlPaper/fig4_style1.jpg
Course style: https://github.com/leongatys/NeuralImageSynthesis/blob/master/Images/ControlPaper/fig4_style2.jpg
Step 1:
python linear-color-transfer.py --target_image coarse_style.png --source_image fig4_content.jpg --output_image coarse_pca.png
python linear-color-transfer.py --target_image fine_style.png --source_image fig4_content.jpg --output_image fine_pca.png
Step 2 (Gatys called the output from this step, "stylemix", but I used a generic name from a the list of experiments I was running):
th neural_style.lua -tv_weight 0 -content_weight 0 -style_weight 10000 -output_image out5.png -num_iterations 550 -content_image coarse_pca.png -style_image fine_pca.png -image_size 1536 -content_layers relu2_1,relu4_2 -style_layers relu2_1,relu4_2 -save_iter 50 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune
Step 2.5 (I don't think Gatys' code does this, but I thought it would make the colors look better):
python linear-color-transfer.py --target_image out5.png --source_image fig4_content.jpg --output_image out5_pca.png
Step 3:
Then I tried to mimic Gaty's two step process where the first image is generated at 512px:
th neural_style.lua -style_weight 10000 -output_image out_final.png -num_iterations 1000 -content_image fig4_content.jpg -style_image out5_pca.png -image_size 512 -save_iter 0 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune
th neural_style.lua -style_weight 10000 -output_image out_final_hr.png -num_iterations 550 -content_image fig4_content.jpg -init_image out_final.png -style_image out5_pca.png -image_size 1536 -save_iter 0 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune
Those commands in that order should give you the exact same output as I got.
After making more tests with different models, I was wrong: the noise is not added by padding. It's a quality of some models: vgg19 from crowsonkb's repository makes clean images with or without padding, and images made with Illustration2Vec, for example, have noisy borders even with default padding.
Examining the outputs produced by ScaleControl.ipynb:
Gatys' Scale Control code produce 3 different outputs, each follows a two step Multires process. I am not sure if these are 3 different ways of doing Scale Control, or if 1 or 2 of them are meant to showcase ways that don't work?
For each of the 3 options in the iPython script, I ran the code and generated the images. Each produced a low resolution 648x405 image and then a 1296x810 "hr" resolution image. Though the image names say that the first image has a resolution of 512px and the second image has a resolution of 1024px, which means there may be some else going on here (maybe downsampling?). I have included both images for each example, and they can be viewed in full in the Imgur link below each example.
Gatys' iPython code names the images with the parameters used to create them, and as such I have included the image file names.
"Stylemix images" are what Gatys calls the resulting combination style image made of both the coarse and fine style images.
low res and hr res: https://imgur.com/a/D7AcK
low res:
cimg_cm_fig4_content.jpg_scimg_fig4_content.jpg_spimg_fig4_style2.jpg_simg_fig4_style3.jpg_pt_layer_relu2_1_sz_512_model_org_pad_sw_1.0E+03_cw_1.0E+00.jpg
hr:
cimg_cm_fig4_content.jpg_scimg_fig4_content.jpg_spimg_fig4_style2.jpg_simg_fig4_style3.jpg_pt_layer_relu2_1_hrpt_layer_relu4_1_sz_512_hrsz_1024_model_org_pad_sw_1.0E+03_cw_1.0E+00.jpg
iPython terminal output: https://gist.github.com/ProGamerGov/a613c42514b9059ebc8230d2c1cd0fd1
low res and hr res: https://imgur.com/a/oTB1k
low res:
cimg_cm_fig4_content.jpg_scimg_fig4_content.jpg_spimg_fig4_style2.jpg_simg_fig4_style3.jpg_pt_layer_relu2_1_sz_512_model_norm_pad_sw_2.0E+08_cw_1.0E+05.jpg
hr res:
cimg_cm_fig4_content.jpg_scimg_fig4_content.jpg_spimg_fig4_style2.jpg_simg_fig4_style3.jpg_pt_layer_relu2_1_hrpt_layer_relu4_1_sz_512_hrsz_1024_model_norm_pad_sw_2.0E+08_cw_1.0E+05.jpg
iPython terminal output: https://gist.github.com/ProGamerGov/3d8f8ffdbde5f8ec69c46f3076fa3f2d
low res and hr res: https://imgur.com/a/LbqJQ
low res:
cimg_cm_fig4_content.jpg_scimg_fig4_content.jpg_spimg_fig4_style2.jpg_simg_fig4_style3.jpg_pt_layer_relu2_1_sz_512_model_norm_pad_sw_2.0E+08_cw_1.0E+05_naive_scalemix.jpg
hr res:
cimg_cm_fig4_content.jpg_scimg_fig4_content.jpg_spimg_fig4_style2.jpg_simg_fig4_style3.jpg_pt_layer_relu2_1_sz_512_hrsz_1024_model_norm_pad_sw_2.0E+08_cw_1.0E+05_naive_scalemix.jpg
iPython terminal output: https://gist.github.com/ProGamerGov/71eda3b16793835bbe142d902c480fe7
Name:
scimg_fig4_content.jpg_spimg_fig4_style2.jpg_simg_fig4_style3.jpg_hrpt_layer_relu4_1_hrsz_1024_model_norm_pad_ptw_1.0E+05.jpg
Name:
scimg_fig4_content.jpg_spimg_fig4_style2.jpg_simg_fig4_style3.jpg_pt_layer_relu2_1_sz_512_model_norm_pad_ptw_1.0E+05
Name:
spimg_fig4_style2.jpg_simg_fig4_style3.jpg_hrpt_layer_relu4_1_hrsz_1024_model_org_pad_ptw_1.0E+03.jpg
Name:
spimg_fig4_style2.jpg_simg_fig4_style3.jpg_pt_layer_relu2_1_sz_512_model_org_pad_ptw_1.0E+03.jpg
I am not sure why there are 3 Examples of Scale Control, and 4 stylemix images. But I assume one of the examples must use 2 stylemix images?
Ok, so trying both models from Gatys repository which are the normalized VGG19, and the VGG-19 Conv model, I can't seem to get the parameters right. Up until now I was using the default VGG-19 model.
wget -c --no-check-certificate https://bethgelab.org/media/uploads/deeptextures/vgg_normalised.caffemodel
wget -c --no-check-certificate https://bethgelab.org/media/uploads/stylecontrol/VGG_ILSVRC_19_layers_conv.caffemodel
I assume the default Neural-Style VGG-19 prototxt may not work with these models?
wget -c https://gist.githubusercontent.com/ksimonyan/3785162f95cd2d5fee77/raw/bb2b4fe0a9bb0669211cf3d0bc949dfdda173e9e/VGG_ILSVRC_19_layers_deploy.prototxt
Edit: It seems that the models are special versions created by Leon Gatys: https://github.com/jcjohnson/neural-style/issues/7
I don't know why, but I can't seem to get either model to work.
Using Gatys' weights for Scale Control in Neural-Style seems to work pretty well:
Also, the sym
option on the match_color
function is for luminescence style transfer.
@ProGamerGov, by the way, the code can probably be reimplemented in Torch and maybe even will not be too different:
@VaKonS Thanks, I'll take a look at that.
@htoyryla I have started trying to extract the python code responsible for luminescence style transfer: https://gist.github.com/ProGamerGov/08c5d25bb867e4313821a45b2e3b2978
As I understand it, the research paper basically describes converting your content/style images to LUV or YIQ, before running them through the style transfer network. In his python code, Gatys appears to use LUV, so I'll start with that.
Testing those 3 functions:
rgb2luv
creates this:
luv2rgb
creates this:
lum_transform results in this error whenever I try to use it:
ubuntu@ip-Address:~/neural-style$ python lum2.py --input_image fig4_content.jpg
Traceback (most recent call last):
File "lum2.py", line 47, in <module>
output_img = lum_transform(input_img)
File "lum2.py", line 32, in lum_transform
img = tile(lum[None,:],(3,1)).reshape((3,image.shape[0],image.shape[1]))
NameError: global name 'tile' is not defined
ubuntu@ip-Address:~/neural-style$
I don't know what "tile" is from and I can't figure out whether it belongs to a package, or is related to another specific variable.
Edit: "tile" is part of numpy, and just like "eye" from linear-color-transfer.py, it needs a "np." appended to it's front.
lum_transform
creates this:
These functions come from Gatys' Color Control code here: https://github.com/leongatys/NeuralImageSynthesis/blob/master/ExampleNotebooks/ColourControl.ipynb
I am not sure if lum_transform is needed to perform the LUV change to and from for style transfer.
Edit: "tile" is part of numpy, and just like "eye" from linear-color-transfer.py, it needs a "np." appended to it's front.
Good you found it. I just woke up and was about to comment :)
I have been trying to implement the features described in the "Controlling Perceptual Factors in Neural Style Transfer" research paper.
The code that used for the research paper can be found here: https://github.com/leongatys/NeuralImageSynthesis
The code from Leon Gatys' NeuralImageSynthesis is written in Lua, and operated with an iPython notebook interface.
So far, my attempts to transfer the features into Neural-Style have failed. Has anyone else had success in transferring the features?
Looking at the code, I think that:
ImageSynthesis.lua is responsible for the luminescence style transfer.
ComputeActivations.lua and ImageSynthesis.lua are responsible for scale control
ComputeActivations.lua and ImageSynthesis.lua are responsible for spatial control.
In order to make NeuralImageSynthesis alongside your Neural-Style install, you must replace every instance of
/usr/local/torch/install/bin/th
with/home/ubuntu/torch/install/bin/th
. You must also install hdf5 withluarocks install hdf5
, matplotlib withsudo apt-get install python-matplotlib
, skimage withsudo apt-get install python-skimage
, and scipy withsudo pip install scipy
. And of course you need to install and setupjupyter
if you want to use the notebooks.