Open ProGamerGov opened 7 years ago
How do the Gatys models fail when you try them?
I can load the conv model using the prototxt, both from the links you gave. Also runs fine in neural_style.
th> require "loadcaffe"
{
load : function: 0x416b6098
C : userdata: 0x41383dc8
}
th> model_file = "VGG_ILSVRC_19_layers_conv.caffemodel"
[0.0001s]
th> proto_file = "VGG_ILSVRC_19_layers_deploy.prototxt"
[0.0001s]
th> cnn = loadcaffe.load(proto_file, model_file, "nn")
Successfully loaded VGG_ILSVRC_19_layers_conv.caffemodel
conv1_1: 64 3 3 3
conv1_2: 64 64 3 3
conv2_1: 128 64 3 3
conv2_2: 128 128 3 3
conv3_1: 256 128 3 3
conv3_2: 256 256 3 3
conv3_3: 256 256 3 3
conv3_4: 256 256 3 3
conv4_1: 512 256 3 3
conv4_2: 512 512 3 3
conv4_3: 512 512 3 3
conv4_4: 512 512 3 3
conv5_1: 512 512 3 3
conv5_2: 512 512 3 3
conv5_3: 512 512 3 3
conv5_4: 512 512 3 3
[0.2703s]
th> cnn
nn.Sequential {
[input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> (24) -> (25) -> (26) -> (27) -> (28) -> (29) -> (30) -> (31) -> (32) -> (33) -> (34) -> (35) -> (36) -> (37) -> output]
(1): nn.SpatialConvolution(3 -> 64, 3x3, 1,1, 1,1)
(2): nn.ReLU
(3): nn.SpatialConvolution(64 -> 64, 3x3, 1,1, 1,1)
(4): nn.ReLU
(5): nn.SpatialMaxPooling(2x2, 2,2)
(6): nn.SpatialConvolution(64 -> 128, 3x3, 1,1, 1,1)
(7): nn.ReLU
(8): nn.SpatialConvolution(128 -> 128, 3x3, 1,1, 1,1)
(9): nn.ReLU
(10): nn.SpatialMaxPooling(2x2, 2,2)
(11): nn.SpatialConvolution(128 -> 256, 3x3, 1,1, 1,1)
(12): nn.ReLU
(13): nn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1)
(14): nn.ReLU
(15): nn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1)
(16): nn.ReLU
(17): nn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1)
(18): nn.ReLU
(19): nn.SpatialMaxPooling(2x2, 2,2)
(20): nn.SpatialConvolution(256 -> 512, 3x3, 1,1, 1,1)
(21): nn.ReLU
(22): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
(23): nn.ReLU
(24): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
(25): nn.ReLU
(26): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
(27): nn.ReLU
(28): nn.SpatialMaxPooling(2x2, 2,2)
(29): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
(30): nn.ReLU
(31): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
(32): nn.ReLU
(33): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
(34): nn.ReLU
(35): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
(36): nn.ReLU
(37): nn.SpatialMaxPooling(2x2, 2,2)
}
@htoyryla
How do the Gatys models fail when you try them?
The style loss function does not work (The values basically stay the same) with them for me with any variation of this command for both models:
th neural_style.lua -content_weight 0 -style_weight 10000 -image_size 1024 -output_image out_norm_hr.png -num_iterations 500 -content_image result_2.png -style_image result.png -content_layers relu2_1,relu4_1 -style_layers relu2_1,relu4_1 -model_file models/vgg_normalised.caffemodel -proto_file models/VGG_ILSVRC_19_layers_deploy.prototxt -save_iter 50 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune
For me this
th neural_style.lua -model_file ../test/pg/VGG_ILSVRC_19_layers_conv.caffemodel -proto_file ../test/pg/VGG_ILSVRC_19_layers_deploy.prototxt
works on a clean copy of neural-style and produces this image at 400 iterations.
I see you tried the normalized... I have almost never used the normalized models, they probably require different weight values.
I now tried using -init image -content_weight 0 and it works too.
Normalized works too, but the losses are very small (which I think is typical for a normalized model), and the does not produce the expect result image --probably one would need a much higher style weight.
Iteration 50 / 1000
Content 1 loss: 0.000000
Style 1 loss: 4.361065
Style 2 loss: 0.464805
Style 3 loss: 0.069460
Style 4 loss: 0.017796
Style 5 loss: 0.015553
Total loss: 4.928679
Iteration 100 / 1000
Content 1 loss: 0.000000
Style 1 loss: 3.661895
Style 2 loss: 0.397874
Style 3 loss: 0.057522
Style 4 loss: 0.017153
Style 5 loss: 0.015032
Total loss: 4.149477
Looking into where the functions are located in Gatys' code here: https://github.com/leongatys/NeuralImageSynthesis/blob/master/ExampleNotebooks/ColourControl.ipynb
lum_transform
seems to come before the style transfer process:
if cp_mode == 'lum':
org_content = imgs['content'].copy()
for cond in conditions:
imgs[cond] = lum_transform(imgs[cond])
imgs['style'] -= imgs['style'].mean(0).mean(0)
imgs['style'] += imgs['content'].mean(0).mean(0)
for cond in conditions:
imgs[cond][imgs[cond]<0] = 0
imgs[cond][imgs[cond]>1] = 1
And then rgb2luv
and luv2rgb
are used:
#execute script
!{script_name}
output = deprocess(get_torch_output(output_file_name))
if cp_mode == 'lum':
org_content = rgb2luv(org_content)
org_content[:,:,0] = output.mean(2)
output = luv2rgb(org_content)
output[output<0] = 0
output[output>1]=1
imshow(output);gcf().set_size_inches(8,14);show()
imsave(result_dir + result_image_name, output)
But I am not sure exactly what the code is doing. I think it's doing something with the style transfer output file, and the input file. Though I haven't been able to get the code that uses rgb2luv
and luv2rgb
to work properly yet.
@htoyryla If they are working for you, then it could have been something weird with the instance I was running, or a corrupted/incorrect prototxt file with the same name.
I downloaded the models and the prototxt from the links you gave, otherwise a fresh copy of neural-style. Maybe you have some modifications in your neural-style? The conv model worked just as usual, the normalized one didn't produce a good image with the default content and style weights, but I've seen that before too.
Difficult to say what a code snippet is doing without being familiar with the whole of it.
But from the paper the color control looks basically simple.
"The modification is simple. The luminance channels LS and LC are first extracted from the style and content images. Then the Neural Style Transfer algorithm is applied to these images to produce an output luminance image Lˆ. Using the YIQ colour space, the colour information of the content image is represented by the I and Q channels; these are combined with Lˆ to produce the final colour output image (Fig. 3(d))."
In other words, one makes luminance-only versions of content and style images, runs them through neural-style and finally applies the color information from the content image to it. I am not too familiar with the these color schemes, but it would be simple to try how this works using YUV and neural-style.
Gatys then goes deeper into using histogram matching for the cases where the histograms of the two images are quite different.
Speaking of "vgg_normalised.caffemodel": together with much higher style weight, it probably requires "normalize_gradients". In above examples with this model, the content weight was 100, style weight 300000, and gradients were "semi-normalized" with scale 0.7.
Related to the discussion above, I made a quick try to run neural-style on luminance (Y) only when -original_colors == 1, then add color (UV) from the content image https://gist.github.com/htoyryla/38a4d6b2280ed5b4e47fc8d67b304f9f
Using my modified code, Gatys VGG19 conv model and neural-style defaults, style transfer with luminance only, followed by transferring color from the content image (Gatys' basic color control)
For comparison, original_colors == 0 (style transfer with color):
For comparison, unmodified neural_style with original_colors == 1 (style transfer with color, followed by transferring color from the content image)
Looks like luminance-only style transfer makes a visible difference in the sky.
In above examples with this model, the content weight was 100, style weight 300000, and gradients were "semi-normalized" with scale 0.7.
I can't see which examples you are referring to, but never mind. I was only suggesting that probably there is nothing wrong with the models.
I further tried to add the first histogram adjustment by Gatys (formula 10 in the paper), adjusting the luminance-only style image (before preprocessing) so that its mean and variance match the luminance-only content image:
local cmean = torch.mean(content_image)
local cvar = torch.var(content_image)
for _, img_path in ipairs(style_image_list) do
local img = image.load(img_path, 3)
img = image.scale(img, style_size, 'bilinear')
if params.original_colors == 1 then
-- use luminance only
img = image.rgb2yuv(img)[{{1, 1}}]
-- match historgram
local smean = torch.mean(img)
local svar = torch.var(img)
img = img:add(-smean):mul(cvar/svar):add(cmean)
end
local img_caffe = preprocess(img):float()
table.insert(style_images_caffe, img_caffe)
end
I guess something went wrong, but somehow I like how this looks:
In this version I separated style transfer (color or luminance), histogram matching (none, whole tensor, channel-wise) and original_colors, to allow trying different approaches (for instance doing style transfer with color, histogram matching per channel and finally restoring original colors. Histogram matching is by mean and var only.
Note that I have not really looked at Gatys' code other than some snippets posted here, so this is simply based on reading a page of Gatys' paper. Do not assume that my params work like Gatys'.
-- quick hack to make neural-style do -- 1 either luminance-only or color based style transfer (--transfer lum|color) -- 2 match style histogram to content (--histogram no|all|rgb) -- no: no histogram matching -- all: match whole tensor (use this with --transfer lum) -- rgb: match each channel separately -- 3 restore original colors can be combined with the above -- NOT ALL COMBINATIONS ARE GUARANTEED TO WORK
https://gist.github.com/htoyryla/af9de7a712d74d12f5d3acc7725e6229
PS. I am quite happy with this now. Made this picture from my own portrait and Picasso's Seated nude.
So using this terrible and messy code: https://gist.github.com/ProGamerGov/ba9a9d54bae53e84ebf0116262df6758
I think I have achieved luminescence style transfer without editing neural_style.lua:
Images used:
https://github.com/leongatys/NeuralImageSynthesis/blob/master/Images/ControlPaper/fig3_style1.jpg
https://github.com/leongatys/NeuralImageSynthesis/blob/master/Images/ControlPaper/fig3_content.jpg
First you transfer the color from your content image, to your style image like so:
python linear-color-transfer.py --target_image fig3_style1.jpg --source_image fig3_content.jpg --output_image style_colored_pca.png
Then you run that through the lum_transfer.py script like this:
python lum_transfer.py --content_image fig3_content.jpg --style_image style_colored_pca.png --cp_mode lum --output_style_image output_lum_style_pca.png --output_content_image output_lum_style_pca.png
Now you run your content image through lum_transfer.py like this:
python lum_transfer.py --content_image fig3_content.jpg --style_image fig3_content.jpg --cp_mode lum --output_content_image out_lum_transfer.png
Now you run both newly created images through Neural-Style like this:
th neural_style.lua -original_colors 0 -image_size 1000 -content_weight 1 -style_weight 1e3 -output_image out_lum6_test.png -content_image out_lum_transfer.png -style_image output_lum_style_pca.png -num_iterations 1500 -save_iter 50 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune
Once you've finished with Neural-Style, then you use the lum_transfer.py script again like this:
python lum_transfer.py --output_lum2 out_lum6_test_400_r.png --content_image out_lum6_test_400.png --style_image output_style_colored.png --cp_mode lum2 --output_style_image output_style_2.png --output_content_image output_content_2.png --output_image out_combined.png --org_content fig3_content.jpg
Now you can leave your image as is, or you can try to change the colors slightly like this:
python linear-color-transfer.py --target_image out_combined.png --source_image fig3_content.jpg --output_image out_combined_lct_pca.png
python linear-color-transfer.py --mode sym --target_image out_combined.png --source_image fig3_content.jpg --output_image out_combined_lct_sym.png
python linear-color-transfer.py --mode chol --target_image out_combined.png --source_image fig3_content.jpg --output_image out_combined_lct_chol.png
And here are my outputs generated by this process:
The final unmodified output:
--mode pca
:
--mode sym
:
--mode chol
:
Album of the full resolution final images: https://imgur.com/a/AW6qU
Screenshot from the research paper:
Control Output using -original_colors 1
and the unmodified style/content images:
The control output with all 3 modes of linear color transfer using linear-color-transfer.py, for comparison: https://imgur.com/a/82xsu
The unmodified content image:
Looking at the research paper's examples, and my examples, it seems really difficult to actually see the difference between Luminescence Style Transfer, and normal style transfer.
@htoyryla Your modifications to neural_style.lua created this (Though I used -image_size 649
for this output instead of the -image_size 1000
that I used for the above outputs):
And using linear-color-transfer.py has little to no effect on the output with all 3 modes: https://imgur.com/a/wmNyO
My outputs look closer to the research paper's outputs, but yours seems to have more vivid "light spots".
There are more examples from the research paper, and Gatys' Github code here: https://github.com/ProGamerGov/Neural-Tools/wiki/NeuralImageSynthesis-Color-Control-Examples
The -original_colors 1
parameter in Neural-Style uses YUV, where the "Y" part deals with luminance. So this parameter's effect on our results should be noted. Though I am unclear on what part if any, linear-color-transfer.py plays with luminance.
The -original_colors 1 parameter in Neural-Style uses YUV, where the "Y" part deals with luminance. So this parameter's effect on our results should be noted.
-original-colors 1 does not affect the style transfer at all, it only adjusts the output image so that luminance comes from the output image and color from the content image.
One of Gatys' proposals is to make style transfer using luminance only and only afterwards add color just like with original-color 1. Note that this is different from doing the style transfer in color and then copying color from the content image.
The modification is simple. The luminance channels LS and LC are first extracted from the style and content images. Then the Neural Style Transfer algorithm is applied to these images to produce an output luminance image Lˆ. Using the YIQ colour space, the colour information of the content image is represented by the I and Q channels; these are combined with Lˆ to produce the final colour output image (Fig. 3(d)).
In my latter version this is done with -transfer lum -original-colors 1.
Gatys then goes on
If there is a substantial mismatch between the luminance histogram of the style and the content image, it can be help- ful to match the histogram of the style luminance channel LS to that of the content image LC before transferring the style. For that we simply match mean and variance of the content luminance.
In my latter version, this is done (although in YUV space) by -transfer lum -histogram all -original-colors 1
I have not implemented the color matching method described in section 5.2 of the paper. However, when he later writes
In comparison, when using full style transfer and colour matching, the output image really consists of strokes which are blotches of paint, not just variations of light and dark.
I wanted to try the something like this although with simple histogram matching, setting -transfer color -histogram rgb -original_colors 1
Note that when doing histogram matching, the change in the levels of the style image may require different style weight settings.
I hope this clarifies the logic behind my code.
@htoyryla, I was referring to those examples. Just a few tests with different styles and models, my mistake was that I didn't test other models at first and made wrong assumptions.
@ProGamerGov, looks awesome!
Just to be sure: is line 101 supposed to be "content_img = lum_transform(style_img)
"?
In original code there is "imgs[cond] = lum_transform(imgs[cond])
", which probably means that _contentimg should be converted from _contentimg, not from _styleimg.
Anyway, results look great!
Second, I think I've found a way to get rid of noise near the borders. It's very simple - to add 1 pixel gray border around the image at the very beginning, right after "local net = nn.Sequential()" and before "nn.TVLoss" layer. Something like:
if params.padding ~= 'default' then
print('Padding image with zeroes')
net:add(nn.SpatialZeroPadding(1, 1, 1, 1):type(dtype))
end
Results (above are regular versions, below are with gray border, variants with "default-reflected-replicated" paddings):
And it doesn't seem to add noise to previously clean images, as far as I can tell:
Disadvantage - gray border appears around the edge with some models, so it should be optional:
It looks like there are many variant procedures possible based on Gatys' paper. Like these lines in @ProGamerGov's code https://gist.github.com/ProGamerGov/ba9a9d54bae53e84ebf0116262df6758#file-lum_transfer-py-L98-L103 correspond to luminance only style transfer (assuming the correction by @VaKonS above) but unlike my implementation based on a paragraph in the paper quoted above, here the mean of style image is additionally adjusted to match the content image.
My -histogram option then does that but also adjusts the variance according to formula 10 in the paper.
@VaKonS isn't padding with zeroes actually padding with black pixels. It seems straightforward though to modify SpatialZeroPadding to pad with a given value, like 0.5 for gray.
@htoyryla, nn.SpatialZeroPadding
pads with zeroes, but images are converted to mean-centered first, therefore zeroes in processing layers are actually mean values in images, which should be gray pixels in most cases. At least I think so.
upd. Images' mean values in "Neural-style" are not "should be gray", but exactly RGB = 123.68, 116.779, 103.939, if I'm not mistaken.
Oops... my mistake. Of course when the padding is done inside the model the values are as you say. I was thinking in terms of Torch images where the values are between 0 and 1. I've done the same mistake today working on a NoiseMask module which masks part of an image and replaces it by noise (but in that context it is not so critical... actually there is nothing wrong with the module as the mean and std are parameters).
Experimenting with my code, I have been thinking of what Gatys et al wrote: "If there is a substantial mismatch between the luminance histogram of the style and the content image, it can be help- ful to match the histogram of the style luminance channel LS to that of the content image LC before transferring the style."
I think here is an example. This is a style image which was itself created by neural-style from my own materials.
Using it as a style image without histogram matching can result in something like this
while with histogram matching one can get something like this (not really like the style but still looks better)
But it is not so obvious when histogram matching works and when not. From the same two images, with different settings, I get this with histogram matching (far too dark)
and now I get quite close to the original style without histogram matching
In these examples I did not use luminance transfer nor original_colors, just histogram on and off. My feeling based on this is that histogram matching does not necessarily help getting the exact look of the original style, but it can help getting a good balance between style and content in some problematic cases. And it can also be used in creating pictures with a strong interesting style which is noticeably different from the original style.
@htoyryla, I thought that histogram matching tricks were to make stylized image use original content colors while keeping some elements from style whose colors were not present in content image.
p. s. It's my mistake with image padding – I shouldn't probably have used "image padding", because it's "layer padding", as you have correctly noticed.
Looking at @ProGamerGov's lum_transfer.py, which if I understand correctly is used to produce an image file for subsequent input into neural_style. In this process it seems to me that the image is stored as an image file (png?) so the question is: is it safe to put, say, LUV image data into a png file and recover the correct data when read into neural-style later. I don't know about png but it seems to me that it might be for greyscale or RGB only.
On the other hand... I cannot say that the whole process using python scripts together with neural-style would be clear to me. Neural-style too expects RGB unless it has been modified. One could, I think, make luminance only transfer by copying L into R, G and B channels before saving image file for neural style (which is what my code does in effect inside neural-style because the model expects RGB channels).
I thought that histogram matching tricks were to make stylized image use original content colors while keeping some elements from style whose colors were not present in content image.
Reading carefully, yes, that's what the paper says. My first example, I think, works exactly that way. In the second example, histogram matching gave too extreme results but leaving it off gave a very good (in my eyes) result preserving even the sense of depth, even if the original coloring is not preserved.
Personally, I lean towards original artistic application of these techniques, and don't look for a single perfect tool for everything, but rather a versatile toolbox. For an example of an interesting result using histogram matching see my cubistic portrait from yesterday. Funny... that too was made from the same photo as these examples today.
So I have now made I refined version of my lum_transfer.py script: https://gist.github.com/ProGamerGov/2e7a0fe7a5ef6e117dc0be81df243331
Now the process only takes 4 commands (including neural_style.lua) to complete:
python linear-color-transfer.py --target_image fig3_style1.jpg --source_image fig3_content.jpg --output_image style_colored_pca.png
python lum_transfer.py --content_image fig3_content.jpg --style_image style_colored_pca.png --cp_mode lum --output_style_image output_lum_style_pca.png --output_content_image output_lum_content_pca.png --org_content fig3_content.jpg
th neural_style.lua commands here...
python lum_transfer.py --output_lum2 out_lum6_test_400_r.png --cp_mode lum2 --output_image out_combined.png --org_content fig3_content.jpg
For lum_transfer.py, the outputs and required inputs are now dependent on the --cp_mode
you choose. Though for --cp_mode lum2
and --output_lum2
there is an issue. One must resize either their Neural-Style output to match their original content image's size, or vice version. Though I would image that in order to preserve the quality of your Neural-Style output, you would want to resize your content image.
Does anyone know how I can resize the content image in the script, to match that of the Neural-Style output image, in the Python code?
Edit: I think Gatys solves this issue like this:
hr_init = img_as_float(scipy.misc.imresize(lr_output, imgs['content'].shape))
Or this code has something to do with it:
for cond in conditions:
imgs[cond] = img_as_float(imread(img_dirs[cond] + img_names[cond]))
if imgs[cond].ndim == 2:
imgs[cond] = tile(imgs[cond][:,:,None],(1,1,3))
elif imgs[cond].shape[2] == 4:
imgs[cond] = imgs[cond][:,:,:3]
try:
imgs[cond] = transform.pyramid_reduce(imgs[cond], sqrt(float(imgs[cond][:,:,0].size) / img_size**2))
except:
print('no downsampling: ' + img_names[cond])
imshow(imgs[cond]);show()
Edit, this does not work:
import scipy
org_content = scipy.misc.imresize(output, org_content.shape)
@ProGamerGov, here, for example, images are transformed like this:
from skimage import io, transform im = io.imread(input_name) im = transform.resize(im, (im.shape[0]scale, im.shape[1]scale), order=3) io.imsave(output_name, im)
upd. This also works for me (note that "transform" must be imported from skimage, it seems, or scipy will not find its "misc" part):
import scipy
from skimage import io, transform
im = io.imread(input_name)
im = scipy.misc.imresize(im, (im.shape[0]*2, im.shape[1]*2))
io.imsave(output_name, im)
@VaKonS This is the specific part of the script in which the resizing needs to take place:
elif cp_mode == 'lum2':
output = args.output_lum2
org_content = args.org_content
org_content = imread(org_content).astype(float)/256
output = imread(output).astype(float)/256
#Resize "org_content" to match the size of "output".
org_content = rgb2luv(org_content)
org_content[:,:,0] = output.mean(2)
output = luv2rgb(org_content)
output[output<0] = 0
output[output>1]=1
imsave(output_a_name, output)
Do I need to read both images before they are converted to the required float value?
This is the control image with no modifications:
This the control image run through the last lum_transfer.py step:
This is the output I made following the steps I have outlined above:
This one is the same one as the above image, except for the fact that I swapped the content and style images for the first step, so linear-color-transfer.py
transferred the color from the style image, to the content image:
I am not sure if it's the content and style images that I am using, but the luminance changes seem very subtle to me.
The full images can be viewed here: https://imgur.com/a/1k6nk
Edit:
This is what the output looks like when you skip the first linear-color-transfer.py step:
Maybe it is just the style image's luminance?
Following the luminance transfer steps on this new style image, results in this:
Control Test with the last lum_transfer step and -original_colors 1:
Looking at the images side by side (control on the right):
The difference is much more apparent with this style image. For example, the lighting on the grass is different between the two images
Full versions of these images can be found here: https://imgur.com/a/5daEM
Here's the comparison with linear-color-transfer.py --mode pca
:
@ProGamerGov, scipy.misc.imresize
seems to automatically convert output to array of [0...255] integers, therefore if you need floats, you'll have to resize image first, and then convert if to ".astype(float)/256".
On the contrary, it looks like skimage.transform.resize
accepts and returns array of floats [0...1], so with it you'll need to convert to float/256 at first, and after that resize.
p. s. Shouldn't image arrays be divided by 255, by the way?
Theses are the results of my luminance tests for iteration count, and -image_size
:
Full image here (with labels): https://i.imgur.com/xLLukch.png Full image here (without labels): https://i.imgur.com/DY4dvt9.png From this album of comparison images: https://imgur.com/a/eoYMf
In this set of comparison images, the effects of luminance transfer/color control, is more visible. I used a variation of this neural_style.lua command (-original colors 1 for the control tests):
th neural_style.lua -original_colors 0 -image_size 1000 -content_weight 1 -style_weight 1e3 -output_image out_lum8_test.png -content_image output_lum_content_pca.png -style_image output_lum_style_pca.png -num_iterations 1500 -save_iter 50 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune
@VaKonS Thanks for the help, using skimage.transform.resize
did the trick:
org_content = skimage.transform.resize(org_content, output.shape)
https://gist.github.com/ProGamerGov/2e7a0fe7a5ef6e117dc0be81df243331
The script now doesn't require the user to do any manual resizing.
I repeat my question about @ProGamerGov's process, as it seems to me that in this process which is mixing python scripts with neural-style, in order to do luminance only transfer, a python script will produce the luminance-only copies of the images which are then supposed to undergo style transfer in plain neural style. There are some problems in this process as I see it and I am not sure if they have been addressed properly.
How do we get these luminance only images into neural style?
How do we get neural-style to understand that the input is luminance-only and not RGB, and further, how does neural-style process luminance-only data when the model expects three channels (RGB).
If we just save the output of luminance conversion into a png file, then an unmodified neural-style will read it assuming RGB and the result will be something quite different from luminance-only transfer.
The simple solution would be, I think, to save the luminance-only images as RGB so that the L channel is copied, possible scaled, into R,G and B channels. This is what I do in my modified neural-style.
PS. As to my earlier note wondering whether png can be used to contain yuv data instead of rgb (assuming that the receiving end knows that the contents are yuv instead of rgb), I made a test converting to yuv, saving into png, reading back, converting into rgb and saving again. There was an easily noticeable color shift.
@htoyryla
python scripts with neural-style, in order to do luminance only transfer
Looking at the naming of the feature in Gatys' code, he calls it "Color Control", which leads me to believe it's not just about the luminance transfer. His code does not produce the same outputs as that of the research paper, which makes me wonder if the changes are intentional as the code seems to produce the same outputs for the other features in the research paper.
For instance, when comparing the results of your neural_style.lua modifications, to his results, it's like the only difference is that he replaced the color white, with the color yellow:
Results with your code:
From the research paper:
As to my earlier note wondering whether png can be used to contain yuv data instead of rgb (assuming that the receiving end knows that the contents are yuv instead of rgb), I made a test converting to yuv, saving into png, reading back, converting into rgb and saving again. There was an easily noticeable color shift.
So would these results indicate that png can be used to contain yuv data instead of rgb?
So would these results indicate that png can be used to contain yuv data instead of rgb?
My result shows that using png to contain YUV changes the color content. The color nuances in the original image have been replaced by shades of pink when the image has passed from rgb -> yuv -> png -> read as yuv -> rgb.
But using png to carry YUV content is not the main issue, there is also more serious issue that neural-style and VGG are designed to work with RGB, not YUV (to which my solution is converting Y to RGB by setting R,G,B = Y).
You seem to suggest that Gatys' code implements some other scheme than described in the paper. It can well be, but I am not convinced that your approach duplicates his thinking, if you cannot even explain how your process handles the problems I mentioned.
After all, I was just raising some valid questions about the process, and to me nicely looking results are not a proof that the process is correctly implemented. Making correctly working software requires an understanding of both the process and all details of the implementation. If the implementation is divided between separate programs, one needs to understand how to make them working correctly together and passing information correctly between them. For instance, in this particular case one needs to be aware in which format the image is being processed at any point in the process.
It seems to me that I am wasting my time now, so I'll most likely leave this thread.
@htoyryla, @ProGamerGov, if I'm not mistaken, then the code actually does not save YUV values in PNGs. line 126: "output = luv2rgb(org_content)" converts YUV colorspace back to RGB before saving.
I think that at least two functions from "Controlling Perceptual Factors in Neural Style Transfer" can already be used in Neural-style, improving the quality of results and not overcomplicating it:
Though both need to be polished, tested and reimplemented in Torch to use in Neural-style.
@VaKonS Yes, the code converts the custom gray scale images back to the RGB after working with them in the LUV colorspace. I am currently finalizing my analysis of how the scripts work in relation to Gatys' examples, and Neural-Style's code.
You seem to suggest that Gatys' code implements some other scheme than described in the paper. It can well be, but I am not convinced that your approach duplicates his thinking, if you cannot even explain how your process handles the problems I mentioned.
@htoyryla Sorry, I should have dug into the inner workings of the how, and why of the script far sooner. My bad.
The following assumes that I have correctly implemented Gatys' code in both scripts. I used ImageMagick to both analyze the inputs before they were run through the scripts, and the outputs after they were run through the scripts:
The linear-color-transfer.py
script makes the RGB channel means, standard deviations, and overall mean/standard deviation, extremely close to that of values of your chosen source image. Min, max, and gamma remain unaffected. With this in mind, it does not seem like this script is crucial to the function of the luminance transfer process. Though this script does seem make the style image's colors work better for style transfer, as seen above in the experiments with Picasso's Seated Nude artwork.
As for the lum-transfer.py
script, things are more complicated than it first appears:
There are many different methods and meanings of the word "grayscale", but I think that Gatys is using a custom method the focuses on luminance. Grayscale images only contain shades of grey, and no color. Each pixel has a "luminance" value. Other names for this "luminance" value, are "intensity", and "brightness".
This means that a grayscale in RGB form, represents the luminance of the image. So one can run images representing luminance through Neural-Style. So the scripts always output in the RGB color space, even though they are working with both a grey scale color space and the LUV color space.
I can break the lum-transfer.py and the linear-transfer-color.py code by giving either script a grey scale image with every grey scale algorithm I have tried. This is because Gatys is not using the rec709luma algorithm, or any other algorithm known to ImageMagick. ImageMagick is listed the script's gray scale output in RGB form has it's intensity value listed as, "Undefined" while the intensity value for my control test is listed as "rec709luma".
The lum-transfer.py script's lum
mode first converts your input image that's in the RGB color space, to a grey scale format using the grey scale format as way to preserve/focus on the luminance of the input image.
The lum-transfer.py script's lum2
mode takes the RGB output from Neural-Style composed of the two grey scale images you ran through Neural-Style. It also takes your original RGB content image. Your original content image is then converted from the RGB color space to the LUV color space. This RGB to LUV conversion is what the rgb2luv
function does. After this, the script the original content image in LUV form, and applies it to the RGB gray scale output image. This is what the luv2rgb
function does. The luv2rgb
function basically uses the luminance from the LUV color space, to apply the colors onto the grey scale luminance output image.
In summary, the Python scripts I have made, and Gatys' code, are using a gray scale color space to represent luminance, in the RGB color space to bypass the limitations of the pre-trained caffemodel. In order to add the color back to the gray scale luminance output, the LUV color space is used along with the original content image.
Now, the question is are the differences between the Python script aided outputs, and Gatys' outputs because of the neural_style.lua parameters I used, or am I missing a step in the luminance transfer process?
I am still learning how histograms work, so I don't know exactly where, if at all, does histogram matching play a role in what I described above. I also may have missed some things, so please feel free to let me know where I messed up.
Comparing what I have learned/figured out, to what the research paper says, I think the linear-color-transfer.py script is for histogram matching:
If there is a substantial mismatch between the luminance histogram of the style and the content image, it can be helpful to match the histogram of the style luminance channel LS to that of the content image LC before transferring the style. For that we simply match mean and variance of the content luminance.
I think that the lum
mode on the lum-transfer.py Python script is responsible for this luminance histogram matching described by the research paper.
For the linear-color-transfer.py Python script, this the relevant part of the research paper:
The one choice to be made is the colour transfer procedure. There are many colour transformation algorithms to choose from; see [5] for a survey. Here we use linear methods, which are simple and effective for colour style transfer. Given the style image, each RGB pixel is transformed as:
where A is a 3 × 3 matrix and b is a 3-vector. This transformation is chosen so that the mean and covariance of the RGB values in the new style image match those of [11] (Appendix B). In general, we find that the colour matching method works reasonably well with Neural Style Transfer (Fig. 3(e)),
Then from the comparison between the two methods:
The colour-matching method is naturally limited by how well the colour transfer from the content image onto the style image works. The colour distribution often cannot be matched perfectly, leading to a mismatch between the colours of the output image and that of the content image.
In contrast, the luminance-only transfer method preserves the colours of the content image perfectly. However, dependencies between the luminance and the colour channels are lost in the output image. While we found that this is usually very difficult to spot, it can be a problem for styles with prominent brushstrokes since a single brushstroke can change colour in an unnatural way. In comparison, when using full style transfer and colour matching, the output image really consists of strokes which are blotches of paint, not just variations of light and dark. For a more detailed discussion of colour preservation in Neural Style Transfer we refer the reader to a technical report in the Supplementary Material, section 2.1.
I am not sure where to find the supplementary material?
Edit: http://bethgelab.org/media/uploads/stylecontrol/supplement/
Second Edit: The supplementary material contains all the raw images that were used in the research paper. This in addition to a lot more examples and details.
There also appears to be a tool that's made to examine what the different layers of a style image look like. This appears to be what the -aesth_input
command in his code, is for.
@VaKonS I was mainly looking at lum_transform when cp_mode = lum which I understood to correspond to luminance-only transfer (with the addition of adjusting the mean). But must say I have not had time to look deep enough. Just raised a concern about an additional complexity when using this kind of mixed approach.
Finally looked more deeply... lum_transform produces a monochrome image so there is no problem, neural-style will internally make a 3xHxW tensor in which all channels are copies of the monochrome image. It is just that neural_style will not be able to restore original_colors because it only gets monochrome images, but of course that can be solved by adding another script into the pipeline.
I apologize for commenting... I was just following my former R&D leader role instincts when this started growing more and more complex. I'll ignore this thread from now on... haven't got the time to follow closely enough and the structure of the solution isn't making it any easier.
My concern was thus limited to program implementation, not to theoretical issues concerning different color spaces. This is because in my view the program needs to be solidly implemented before there is any point starting exploring further improvements on more theoretical level.
I think this all comes down to my preference for neural-style, torch and lua, as well as simplicity and clarity, as a basis for development.
Pulling images directly from Gatys' code (The iPython notebook example), the content images before the style transfer process are exactly the same as as the ones made by --cp_mode lum
:
These are the outputs from from the lum-transfer.py script for comparison (ignore the size difference mistake):
After style transfer, you get pretty much the same output as with Neural-Style:
For adding the color to the output image, lum-transfer.py, and Gatys' code both do the exact same (Script on the left, Gatys' iPython notebook example on the right):
Full image size album here: https://imgur.com/a/HSdcj
Seeing these outputs, I do believe I have replicated Gatys' luminance transfer code in Neural-Style by using the lum-transfer.py script. The differences between his outputs and mine, arise from both differences in the style transfer parameters, and because he uses a two step process.
@htoyryla I think I have with the best of my knowledge, explained how the process works, and now I have proven that there are no missing steps from the process. So I think it is safe to say that I have gotten luminance transfer working in Neural-Style. Though I'll have to play around with your luminance modifications for neural_style.lua some more.
My problem is that with the limited time I have, I cannot follow your process as it is evolving. Like when you now say "after style transfer, you get pretty much the same output as with Neural-Style", I am confused because I believed your process used neural-style for style-transfer proper (and which was the basis for my recent concerns).
Therefore, no point in my continuing to participate. The early part of the thread gave me important impulses but now I think my participation is counterproductive for all of us.
PS.
are using a gray scale color space to represent luminance, in the RGB color space to bypass the limitations of the pre-trained caffemodel. In order to add the color back to the gray scale luminance output, the LUV color space is used
That is pretty much what my neural-version does, too, with the steps inserted into the appropriate places in neural-style. Perhaps it is only that the way you have arranged everything does not work well with my intuition. Like running the same script with cryptic option names to do different things in different parts of the process. That I am not so familiar with numpy does not help either.
The part "limitations of the pre-trained caffemodel" is really not fair. Any trained model has to be based on some constraints on the format of the data, RGB being the most natural choice in my view, and monochrome (greyscale) image can easily be represented in RGB just like we both have done. It was only that my not being fluent in reading numpy made me miss that you too had done so.
@htoyryla
Like when you now say "after style transfer, you get pretty much the same output as with Neural-Style", I am confused because I believed your process used neural-style for style-transfer proper.
Sorry, my bad. I was referring to running Gatys' code from the iPython notebook example and how it compared with doing the same in Neural-Style + the Python scripts.
Therefore, no point in my continuing to participate. The early part of the thread gave me important impulses but now I think my participation is counterproductive for all of us.
The luminance stuff is pretty much done, but the next feature (and final feature I think?) is called Spatial Control, and it looks as though I need to modify the deeper levels of neural_style.lua, which you know far better than I do, due to your own experimentation with the code. I previously thought I could use some simple Python code for masks, but upon examining the supplementary material for the research paper, it appears I cannot do things in Python.
Using masks in torch based style processing is of interest to me and it would be quite simple to add mask before the gram matrix. But full spatial control, as I understand it, would require the use of multiple gram matrices each with its own mask. I faintly recall that Gatys put the multiple gram matrices in a tensor adding just one more dimension, which sounds simple enough in principle. That also feels interesting for my own purposes but not something to do in one hour, so it'll have to wait.
Using masks for spatial control is also not so easy if you have to make the masks manually. I know because I have used neural-doodle for semantic style transfer. To be really useful, one would need an automatic solution for creating the masks (belonging to the general category of image segmentation). I had a look at one promising solution, but the code was in matlab; the model was caffe but used custom layers not recognized by loadcaffe, using it would have required using a custom build of caffe to run the model for making the masks, so I didn't go further with it.
@ProGamerGov, @htoyryla, I think I've made "match_color" in Torch. In case if someone needs it. https://github.com/VaKonS/neural-style/blob/7682e2e24b650206afeddc576a6ca0778d11260c/match_colors.lua
Results seem to be byte-idenctical to ProGamerGov's script.
Does anybody knows how to remove commits from pull request with browser interface?
This might be of interest https://arxiv.org/pdf/1701.08893v1.pdf
Assuming that match_color is meant to adjust the style image before style transfer (which will then be in color), I added it to my code: https://gist.github.com/htoyryla/9ee49c5ff38dda7d0907b6878c171974
Use parameter -histogram matchcolor (transfer should be left to color and originalcolors I believe shall be off).
This expects to find match_color in a file match_colors.lua in the same directory, use @VaKonS's code but remove everything else than the function.
This should also work with multiple style images but I have not tested it.
Here's a sample output using neural-style default and Gatys' VGG_ILSVRC_19_layers_conv.caffemodel
Note to myself: If both -histogram matchcolor and -transfer lum are set, the images are first modified to greyscale and the color matched, which probably fails because of tensor size mismatch. It might be more interesting to change this order, so that color matching is done first and then -transfer lum would perform a luminance-only style transfer. Maybe one could use match_color also as an optional method for originalcolor.
@ProGamerGov , saw in email about your attempt to add loss module that Gatys used. It looks like Gatys' code is based on original neural-style where the loss models get the targets as input. The new neural-style code (updated in December 2016 I think) works differently, the loss module must implement two modes:
So what you were trying to do might with good luck work in the pre-12/2017 code.
I have been trying to implement the features described in the "Controlling Perceptual Factors in Neural Style Transfer" research paper.
The code that used for the research paper can be found here: https://github.com/leongatys/NeuralImageSynthesis
The code from Leon Gatys' NeuralImageSynthesis is written in Lua, and operated with an iPython notebook interface.
So far, my attempts to transfer the features into Neural-Style have failed. Has anyone else had success in transferring the features?
Looking at the code, I think that:
ImageSynthesis.lua is responsible for the luminescence style transfer.
ComputeActivations.lua and ImageSynthesis.lua are responsible for scale control
ComputeActivations.lua and ImageSynthesis.lua are responsible for spatial control.
In order to make NeuralImageSynthesis alongside your Neural-Style install, you must replace every instance of
/usr/local/torch/install/bin/th
with/home/ubuntu/torch/install/bin/th
. You must also install hdf5 withluarocks install hdf5
, matplotlib withsudo apt-get install python-matplotlib
, skimage withsudo apt-get install python-skimage
, and scipy withsudo pip install scipy
. And of course you need to install and setupjupyter
if you want to use the notebooks.