Closed artcg closed 7 years ago
The style target is the Gram matrix calculated from a layer in the model when the style image is forwarded into the model. The style of the output image is derived in the same way. But whereas the layer consists of C feature maps of size H x W, the Gram matrix is C x C, meaning that the style representation has lost the spatial structure of the image. So although it could be visualized it is not comparable to the images.
Edit. I was writing based on how the original neural-style works. Now I realized that you mean fig. 4 in the paper Perceptual Losses... where the images ^y are the same shape as the images but only contain the stylistical features. This is interesting.
Tried quickly to store the style target as an image, both from original neural_style and fast_neural_style. They look like this, like I expected for the Gram matrix, not at all like the ^y images in the paper. And now, reading more carefully "we use optimization to find an image yˆ that minimizes the style reconstruction loss lφ,j (yˆ,y) for several layers j from the pretrained VGG-16 loss style network φ. The images yˆ preserve stylistic features but not spatial structure."
So it seems to me still that the style target is not visually interesting.
I guess one could try to set content_weight to zero to get an image of pure style. In fact I have tried that with the original neural-style. There were problems though that it easily happened that the iterations terminated when the loss was not changing.
Yes, it works.
th slow_neural_style.lua -style_image /home/hannu/work/Pictures/cy.jpg -content_image /home/hannu/Pictures/hannu0516.png -style_weights 5.0 -content_weights 0 -gpu 0 -output_image cy-purestle.png
produces this. But it is not the style_target, but an image resulting from an optimization in which the loss between style_representation(output) and style_target = style_representation(style_image) is minimized.
Here's the same for udnie.
Thanks so much for the comments and insight @htoyryla! those images are fascinating.
BTW what is the name in the code of the tensor for the gram Matrix you presented in your first comment? (So that I could produce similar matrices for myself) I would be interested in studying those a little too
I think the original neural-style is an easier basis for experimentation as all code is in a single file.
The style_target is target_i on the line https://github.com/jcjohnson/neural-style/blob/master/neural_style.lua#L205.
In fast-neural-style I got the style_target deep down here https://github.com/jcjohnson/fast-neural-style/blob/master/fast_neural_style/StyleLoss.lua#L42 . I am not so familiar with this code yet.
And BTW, thank you for the idea... I think this could be a good tool when looking for good styles.
One more thing... I have often mentioned a problem with the scale of style features... with fast-neural-style they often appear much smaller than one would expect. I think these pure style images illustrate well what happens.
When the output_image size is small, the style features are close in size to the original.
In a larger image, the features are not scaled up but stay the same size in pixels, so that the look of the image changes, the style features are repeated rather than enlarged to keep the same look.
This should explain why fast-neural-style appears to produce smaller style features compared to the original... the default image size was 512px and is now larger.
The same phenomenon is known from original neural-style, too. One might expect that scaling the style image would help, but my experience is that it really doesn't. Perhaps the phenomenon derives from the fact that the underlying model (VGG16 or whatever) was trained on 256x256 images.
At the moment it seems that the only way to keep style features from shrinking in a large image is to generate a small image and use superresolution to scale it up.
Thanks again for the advice! That is interesting regarding the style size! - I don't really understand why that happens since (at least in the case of original neural style) the output image presumably gets scaled down before being passed through VGG-16, so you would think the style features get scaled down too and therefore the style feature sizes would match...
"(at least in the case of original neural style) the output image presumably gets scaled down before being passed through VGG-16"
As far as I know, it is the convolutional layers that scale to whatever size is needed. I am pretty sure of that, because when I experimented with using also FC layers in neural-style (see http://liipetti.net/erratic/2016/03/28/controlling-image-content-with-fc-layers/) I got errors unless the image_size was 224, because the FC layers do not scale (so there was a size mismatch between the highest conv layer and the lowest FC layer).
I was trying to develop a method to visualise the 'style' or 'content' of input images (like they do in the paper this project is based on) Is there a method I could use to return the 'Style target' of an image? slow_neural_transfer returns a loss between the style target and the style of the output image so I'm guessing this is stored somewhere
Thanks for any help! Arthur