Implementing features from the "Controlling Perceptual Factors in Neural Style Transfer" research paper

ProGamerGov commented 7 years ago

I have been trying to implement the features described in the "Controlling Perceptual Factors in Neural Style Transfer" research paper.

The code that used for the research paper can be found here: https://github.com/leongatys/NeuralImageSynthesis

The code from Leon Gatys' NeuralImageSynthesis is written in Lua, and operated with an iPython notebook interface.

So far, my attempts to transfer the features into Neural-Style have failed. Has anyone else had success in transferring the features?

Looking at the code, I think that:

ImageSynthesis.lua is responsible for the luminescence style transfer.
ComputeActivations.lua and ImageSynthesis.lua are responsible for scale control
ComputeActivations.lua and ImageSynthesis.lua are responsible for spatial control.

In order to make NeuralImageSynthesis alongside your Neural-Style install, you must replace every instance of /usr/local/torch/install/bin/th with /home/ubuntu/torch/install/bin/th. You must also install hdf5 with luarocks install hdf5, matplotlib withsudo apt-get install python-matplotlib, skimage with sudo apt-get install python-skimage, and scipy with sudo pip install scipy. And of course you need to install and setup jupyter if you want to use the notebooks.

ProGamerGov commented 7 years ago

@htoyryla Yea, I removed the comment because I doubted my modifications were even functioning correctly. But thanks for the tip. I'll take a look at older versions of Neural-Style, and then hopefully use that knowledge to port the features to the newer version.

htoyryla commented 7 years ago

@ProGamerGov, if you want to make your loss modules work in the new neural-style, you need to compare the old and new loss modules. Especially the part in the new modules where if self.mode == 'capture' and if self.mode == 'loss'. The old modules only implement the "loss" part, the targets having been set when the loss modules were created (see PS on this). Setting the target of a loss module basically means forwarding the appropriate image through the model and storing the output of the loss module as the target.

PS. The old way of setting the targets required that as the model was being assembled, after each content or style layer had been added, the appropriate content or style image was forwarded through the model to capture its output as the target for this layer. Using the new way, one only needs to forward each image through the complete model once in capture mode https://github.com/jcjohnson/neural-style/blob/master/neural_style.lua#L164-L175, and each loss module captures and stores its own targets. But this makes the loss modules more complex.

When I talk about loss modules, I refer to a complete implementation for handling of losses, whereas a loss function for me means the method of calculating loss.

htoyryla commented 7 years ago

Changed my code https://gist.github.com/htoyryla/9ee49c5ff38dda7d0907b6878c171974

allows using -histogram matchcolor -transfer lum -original_colors 1 to do color matching first, then a luminance-only style transfer and finally restore original colors
does color matching from content image to output image when -original_colors 2 (don't know if it is useful though)

Sample output using using -histogram matchcolor -transfer lum -original_colors 1 (here match_color.lua modified to swap dimensions 2 and 3, see following comment)

hannu5b-mc-lum-oc1-whswap

htoyryla commented 7 years ago

@VaKonS , one question as I am using your lua implementation of match_color. The comments say that CxWxH is expected, while torch.image uses CxHxW. I don't know if color matching is sensitive to the order of the dimensions. I can of course try swapping them.

VaKonS commented 7 years ago

@htoyryla, no need to swap them, it's simply me who didn't notice that dimensions are swapped in Torch. I only checked that intermediate results match arrays in Python. Probably this is why I had to use "transpose" on "torch.potrf(Cx)", while "np.linalg.cholesky(Cx)" is not transposed in Python. Still, resulting images in all 3 modes were exactly the same as Python output during tests.

htoyryla commented 7 years ago

@VaKonS, strangely enough I got much better results with my own test material after I swapped the dimensions, but there may be some other reason for that. And thanks for porting the function into lua.

VaKonS commented 7 years ago

@htoyryla, the image is presented as [3 channels n pixels] array in line 46, and all processing is done on 3 "lines of bytes", until in line 79 they are converted back to "3 y * x" format. So, if I'm not mistaken, the function shouldn't depend on image dimensions. But I only tried to repeat the code, maybe changing some internal parameters can improve results.

htoyryla commented 7 years ago

I see... went through it manually in interactive th. It is possible that there was some other difference that I overlooked. I could make a better test., maybe. I don't, however, fully understand the more complex operations (such as the pca specific part, whether it makes a difference if the images are of different size, because in practice they almost always are unless one deliberately deforms either of them (which is not currently done in neural-style). I'll have a look tomorrow.

ProGamerGov commented 7 years ago

An interesting issue when trying to replicate Gatys' color matching match, is "bright spots".

My Neural-Style example is on the left, and Gatys' example is on the right:

Link to the full image: https://i.imgur.com/Cvj2pSm.png

I am not sure what is causing them, and how to limit them. Though Gatys seems to have figured that out.

ProGamerGov commented 7 years ago

I have also discovered that the match_color function, does not work with every single style/content image. Though I don't know the exact reason/circumstances behind this apparent malfunction.

linear-color-transfer.py:

linear-color-transfer.py versus lum-transfer.py:

Full images: https://i.imgur.com/w4sTTNa.jpg, https://i.imgur.com/wDjExW2.jpg

You can see that some either really dark, or really bright parts of the style image create issues, with both the match_color function (linear-color-transfer.py), and luminance transfer (lum-transfer.py). Though when used together, the issue becomes better disguised.

Oddly enough, the first style image and content image were part of my attempt to replicate Gatys' images:

Color Matching:

Luminance-Only:

Full images, and the content/style image here: https://imgur.com/a/vjPjj

htoyryla commented 7 years ago

Areas of black pixels? Maybe when the levels are adjusted (like subtracting the mean) one gets some pixels with negative values which are then clamped to 0 = black?

ProGamerGov commented 7 years ago

@htoyryla That could be the case. I'll try adding some print statements to the code, so that I can see what the levels are, and check for negative values?

htoyryla commented 7 years ago

In principle yes, in torch I would print min() of image tensors in selected places in code looking for negative values. In python you (at least sometimes) have values in range 0 to 255 but the same should hold there too.

ProGamerGov commented 7 years ago

@htoyryla , It appears there are negative values: https://gist.github.com/ProGamerGov/66cd6e662a5eb9fd9e88aa9810b60361#file-lum-transfer-log-L203-L252

It seems that lum-transfer.py is to blame, at least for this example, and not linear-color-transfer.py, and thus the match_color function.

The line of code responsible for the negative values:

style_img -= style_img.mean(0).mean(0)

The line of code is found here on the unmodified lum-transfer.py script.

I used this command on the modified script:

python lum-transfer.py --content_image newyork_construction.jpg --style_image style_colored_pca.png --cp_mode lum --output_style_image output_lum_style_pca.png --output_content_image output_lum_content_pca.png --org_content newyork_construction.jpg 2>&1 | tee /media/ubuntu/Lexar/python.log

This was the modified script I used to output the values: https://gist.github.com/ProGamerGov/0ad73e5b235487806a35c6ef2fc86f54

The equivalent code in Gatys' iPython Notebook example:

if cp_mode == 'lum':
        org_content = imgs['content'].copy()
        for cond in conditions:
            imgs[cond] = lum_transform(imgs[cond])
        imgs['style'] -= imgs['style'].mean(0).mean(0)
        imgs['style'] += imgs['content'].mean(0).mean(0)
        for cond in conditions:
            imgs[cond][imgs[cond]<0] = 0
            imgs[cond][imgs[cond]>1] = 1

ProGamerGov commented 7 years ago

@htoyryla Adding this to my code, seems to have fixed the problem:

style_img [style_img < 0 ] = 0
style_img [style_img > 1 ] = 1

content_img [content_img < 0 ] = 0
content_img [content_img > 1 ] = 1

It was stupid mistake on my part, because I had previously omitted the last step in Gatys code. But that step was what fixed the issue I was having.

This issue does not seem to be related to the color matching bright spots however.

ProGamerGov commented 7 years ago

@htoyryla @VaKonS I ran a quick match_colors test in Python with:

source_img = skimage.transform.resize(source_img, target_img.shape)

There appears to be no relation between image dimensions, and the output.

htoyryla commented 7 years ago

@ProGamerGov it is natural that

style_img -= style_img.mean(0).mean(0)

causes negative values, what matters is after the following line:

style_img += content_img.mean(0).mean(0)

If the mean of the style image > the mean of the content image (style image is lighter) then there may still be negative values (one subtracts more than adds). Setting negative values to zero does not necessarily help either (0 means black in luminance).

htoyryla commented 7 years ago

Tested whether swapping h and w makes a difference in the torch version of match_color():

require 'image'
require 'match_colors'

timg = image.load('helsinki000.png',3) ;
simg = image.load('/home/hannu/hannu512.png',3)
oimg = match_color(timg, simg)
oimg2 = match_color(timg:transpose(2,3), simg:transpose(2,3))
oimg2 = oimg2:transpose(2,3)
print(oimg:dist(oimg2))

Yes, there is, but not more than 7.6392223546888e-13 with pca, chol gives 16.130728925385, sym 9.1518068579506. Visually not much difference though.

@ProGamerGov the question about dimensions was related to the torch code where the H and W dimensions have been swapped, nothing to do the python code.

ProGamerGov commented 7 years ago

I have identified the difference between Gatys' Style and content loss functions, versus Neural-Styles

Gatys' MSE code does this:

self.loss = self.loss + self.weights[t] * self.crit:forward(input, self.targets[t])

Which translated to Neural-Style, should be this:

self.loss = self.loss + self.strength * self.crit:forward(self.G, self.target):mul(self.strength)

While default Neural-Style does this:

self.loss = self.strength * self.crit:forward(self.G, self.target)

From this line of code: https://github.com/jcjohnson/neural-style/blob/master/neural_style.lua#L544

Though when I try to change it, I get this error:

neural_style_gatys.lua:546: attempt to index a number value

Full error: https://gist.github.com/ProGamerGov/5b4644494ea682bbf85250d84da07a09

Edit: I think I translated self.weights into being self.strength, which may be incorrect? Or a different format?

There are a few other very similar differences, so I want to know I am messing up the format or something.

VaKonS commented 7 years ago

@ProGamerGov, this error probably means that Torch is trying array operations on some part of expression, which is a number. Try to replace the line with: self.loss = self.loss + self.strength * self.crit:forward(self.G, self.target) * self.strength

Because :mul() is meant to work "in-place" on a Tensor, if I'm not mistaken. Maybe result of "self.crit:forward(self.G, self.target)" can not be changed directly or not a tensor at all.

By the way, note that Gatys multiplies by "self.weights[t]" once, and you do it twice.

VaKonS commented 7 years ago

@htoyryla, speaking of "Stable and Controllable Neural Texture Synthesis and Style Transfer Using Histogram Losses" (https://github.com/jcjohnson/neural-style/issues/376#issuecomment-281581370) – I didn't quite understand what those "histogram losses" are, but results look very promising on textures, to say the least.

htoyryla commented 7 years ago

It seems to me that Gatys has self.loss = self.loss + ... because his code is adding up losses using multiple targets within the loss module itself: https://github.com/leongatys/NeuralImageSynthesis/blob/master/LossLayers.lua#L74-L76

Like here https://github.com/leongatys/NeuralImageSynthesis/blob/master/LossLayers.lua#L30-L34 it appears that self.targets is 4-dimensional, each target being 3-dimensional instead of 2 like with neural-style gram matrices. Maybe this is related to using multiple gram matrices with masks.

In other words, Gatys uses multiple targets per loss module, neural-style only one. In addition, the target itself is different.

PS. I see that @ProGamerGov has already modified the loss module to a single target. If one now modifies the line as

self.loss = self.loss +

then self.loss will be the accumulated loss from all iterations so far. I don't think that makes sense, but it is probably not the reason for the error message.

htoyryla commented 7 years ago

@VaKonS I only posted the link as the paper looked relevant, haven't read it carefully yet. But already "histogram loss" sounds interesting. I guess histograms could be used (in addition to gram matrices) to evaluate losses (distance between current histogram and target histogram), although a color image exists only at input, elsewhere we have the feature map activations so the meaning of histogram there is maybe a little vague.

htoyryla commented 7 years ago

From the error

/home/ubuntu/torch/install/bin/luajit: /home/ubuntu/torch/install/share/lua/5.1/nn/Container.lua:67:
In 4 module of nn.Sequential:
neural_style_gatys.lua:546: attempt to index a number value

it looks like the problem happens inside the self.crit when forward is called with the given input (pay attention to that the error happens within nn.Sequential). This could mean that there is something wrong with the inputs.

ProGamerGov commented 7 years ago

I have been experimenting with modifying the content and style loss functions in order to gain a better understanding of how to modify them appropriately.

This version of neural_style.lua containts the L2Penalty module from lines 498-539 in Gatys' code: https://gist.github.com/ProGamerGov/ea432324a09822a19af916fe1bfcfc01

To use the L2Penalty module, you specify a weight value with the new -l2_weight parameter. I believe I have implemented the module correctly, by mirroring JcJohnson's total variance code.

Here's an example of the new feature with -l2_weight 0.001 on the right, and -l2_weight 0 on the left:

Full image here: https://i.imgur.com/Sh4U5pD.png

ProGamerGov commented 7 years ago

@htoyryla The padding option I had implemented here on lines 121-136 in my pull request version of neural_style.lua did not seem to work correctly. I tried to fix it, but I am unsure if my modifications solved the problem correctly.

htoyryla commented 7 years ago

@ProGamerGov I can't see any problem immediately, can you tell in which way it is not working.

The code structure is not as clean as one would hope, but L123-136 add the extra padding layer and the convolution layer gets added on L147, with zero padding as set on L134-135. If padding is set to default then L123-136 is not run and the conv layer gets added on L147 with padding intact.

htoyryla commented 7 years ago

Looking at the code of L2Penalty, I can't actually see that it is doing much. It says that it is "adding the [gradOutput] to the gradient of the L2 loss" but I cannot see the gradient of L2 loss being calculated anywhere inside this class. Instead it is adding gradOutput to a weighted copy of the input.

L2 loss itself is calculated and stored in self.loss which is not used anywhere. I get the impression that the class expects something of the code inside which it is to be used. I couldn't find Gatys using L2Penalty anywhere. The same code is found here too https://github.com/aleju/mario-ai/blob/master/layers/L2Penalty.lua .

PS. Looking at how L2 weight decay is usually used, to decay the layer weights towards zero, the code starts to make sense as the gradient of 1/2w^2 is w which kind of explains the "adding gradOutput to a weighted copy of the input", although I wonder why we are adding this to the gradient instead of subtracting if this is meant to decay the values.

However, in neural-style we are not training a model, but creating an image, the pixels of which now take the place of the weights in a layer, and I am not at all sure whether it makes sense to decay the pixel values towards zero.

Just for fun, I changed L2Penalty like this

    self.gradInput:resizeAs(input):copy(input):mul(-m)

and inserted an L2Penalty layer below each StyleLoss layer (instead of between TVLoss and the lowest conv layer) and got an image like this.

l2style1e-5minus_900

and with a slightly higher L2 weight:

l2style5e-5minus_500

ProGamerGov commented 7 years ago

@htoyryla For the padding code issue, the older version with the padding option looked like this: https://github.com/ProGamerGov/neural-style/blob/4bcf625b6e478d35b00901f1fa0f8b80f94a90cd/neural_style.lua#L121-L135

And resulted in a module error when I tried to run it. So I modified it to look like this in order to correct the issue: https://github.com/ProGamerGov/neural-style/blob/master/neural_style.lua#L121-L136

htoyryla commented 7 years ago

@ProGamerGov I see now... it was a scope problem again. Defining padlayer as a local variable inside the if-then-else block leaves padlayer undefined if the net:add(padlayer) is outside the block. Another solution would have been to declare padlayer before the if-then-else block.

htoyryla commented 7 years ago

Looked at @VaKonS 's new assortment of prepadding options. I have no problem with those... it is only that in my mind padding in convlayers has been a method to get the correct output size. Especially when the model contains one or more FC layers, there is really not any freedom to change padding size, because that will change the output size from a layer and lead to a size mismatch somewhere. Here this is not a problem, as we have only convlayers which automatically adjust to the size of the input. Still, it is good to remember this.

VaKonS commented 7 years ago

@htoyryla, I'm just saving what I'm currently experimenting with, after closing pull request. :) Opened request didn't allow me to change the file without adding more irrelevant commits to original project.

ProGamerGov commented 7 years ago

@htoyryla This is a bit off topic from the thread, but Say I want use the image package image.scale, image.translate, etc... to modify the current image between or during iterations, so that my changes stick. Like how Lua implementations here, and here work in regards to Deepdream.

Using the image package inside an updateOutput function, or a updateGradInput does not seem to work?

In the feval(x) function, I am not sure what if any variables inside the function to use the image library functions on. Though I did try a variety of values and they did not work.

As I understand things, the feval(x) function is where neural_style.lua computes the gradient with respect to the loss, and thus is run every iteration? And the feval(x) function is the function that runs the loss functions? Is my understand correct, or incorrect?

Where should I be trying to perform these operations?

Edit: This DeepDream.lua code appears to be based off of neural_style.lua's code, including using an early neural_style.lua loadcaffe wrapper code.

htoyryla commented 7 years ago

It seems to me that combining the approach of these deepdream scripts with neural-style is quite problematic. Neural-style sets up the model including the loss modules and then delegates the task of optimizing the image to the optimizer, l-bfgs or adam. In each iteration, the optimizer calls feval and expects it to return the loss and the correct gradients. Based on these (and the history it stores internally), the optimizer the modifies the image. Feval calculates the loss and the gradients by forwarding the image through the model, backpropagating to get the gradients and reading the losses from the loss modules (which sit inside the model).

It should be technically possible to change the image in feval, but doing so will interfere with the optimizer as the pixels will no longer be in the same place as the gradients calculated from them. I guess this is why the deepdream code examples handle the iterations directly, not through an optimizer.

PS. As to changing img within feval, I tried it and it did not work, because img is a plain tensor, not a Torch image object (img.image was nil). However when I create a plain random tensor exactly like img in th interpreter, it works as a torch image, so there is something going on that I don't immediately grasp.

VaKonS commented 7 years ago

@htoyryla, saving x input as an image inside feval(x) function seems to work, at least like this: if x ~= nil then image.save("feval_x.png", deprocess(x:double())) end

(Note that "deprocess" takes DoubleTensor, and processing within model is done on floats, if I'm not mistaken, therefore :double() is needed).

htoyryla commented 7 years ago

I was aware of the possibility of saving as an image and loading back (maybe_save() does save the intermediate images), but it feels terribly inefficient. There must be an easier way. Also don't think de/preprocessing is needed if we only want to scale or transform the tensor which contains the image.

Furthermore, I am not sure if it is wise to modify x directly in the middle of an iteration. But it all depends how the optimizer works internally. Maybe x is simply to pointer to img. Like I said, changing the image outside the optimizer is a bad idea anyway, just wanted to test how bad. Like do we get a size mismatch if we change the size of the image. I think we will.

htoyryla commented 7 years ago

Back at my main computer...

This works

    image.save("f.png", img)

this does not

  img = image.scale(img, 511, 511, 'bilinear')

and to me it appears that it is because img, being a plain tensor and not an image created by loading from an image file, lacks .image attribute (which is supposed to contain pointers to functions like scale). The funny thing is that using th interpreter, a plain tensor can be manipulated by image.scale etc and the image attribute does contain those functions.

Anyway, this question is more of general interest to me (as I don't think manipulating the image both directly and by the optimizer makes sense). Moreover, having now seen @ProGamerGov's earlier post about implementing the octaves method inside DeepDreamLoss class, I am not at all sure whether the image which needs to be scaled is the image img which is evolving (which is problematic in neural-style), or some other image which would be used internally by the DeepDreamLoss class (which could make sense... I must admit I have not looked deeper into how this DeepDream through scaling is meant to work).

VaKonS commented 7 years ago

@htoyryla, strange that "image.scale" doesn't work – this, for example, works for me (inserted between lines 281 and 282):

    if x ~= nil then
      image.save("feval_x.png", deprocess(x:double()))
      local ti = image.scale(x, 511, 511, 'bilinear')
      image.save("feval_x_scaled.png", deprocess(ti:double()))
    end

Both "feval_x.png" and scaled "feval_x_scaled.png" are saved on every iteration.

VaKonS commented 7 years ago

@htoyryla, by the way, maybe that is because I'm running on CPU. If you are using CUDA, then, I suppose, it should be: local ti = image.scale(x:double(), 511, 511, 'bilinear')

ProGamerGov commented 7 years ago

@htoyryla Can the image library be substituted for modifying the tensor directly, like jcjohnson does?

On lines L109-L112 on this DeepDream lua code, clipping is performed like this:

if clip then
      bias = Normalization.mean/Normalization.std
      img:clamp(-bias,1/Normalization.std-bias)
   end

On Fast-Neural-Style's DeepDreamLoss:updateGradInput function, clamping appears to be done line this:

self.clipped:resizeAs(input):clamp(input, -self.max_grad, self.max_grad)

I am not sure what operations I should be doing on DeepDreamLoss:__init, DeepDreamLoss:updateOutput, DeepDreamLoss:updateGradInput, but I tried to figure it out below, using this modified neural_style.lua: https://gist.github.com/ProGamerGov/ea432324a09822a19af916fe1bfcfc01

I tried to copy DeepDream's octave related operations from here:

  local octaves = {}
   octaves[octave_n] = torch.add(base_img, -Normalization.mean):div(Normalization.std)
   local _,h,w = unpack(base_img:size():totable())

   for i=octave_n-1,1,-1 do
      octaves[i] = image.scale(octaves[i+1], math.ceil((1/octave_scale)*w), math.ceil((1/octave_scale)*h),'simple')
   end

With this in the DeepDreamLoss:updateGradInput:

     local octaves = {}
     octaves[self.octave_n] = self.gradInput:add(self.clipped, -self.max_grad):div(self.max_grad) 
     for i=self.octave_n-1,1,-1 do
        octaves[i] = octaves[i+1]
        local c,h,w = unpack(self.base_img:size():totable())
        local octave_w = math.ceil((1/self.octave_scale)*w), octaves[i+1] 
        local octave_h = math.ceil((1/self.octave_scale)*h)
        self.gradInput = self.gradInput:resize(1, c, 2, octave_h, 3, octave_w)
     end

Though I have yet figure out the code below for octave, octave_base in pairs(octaves) do: https://github.com/bamos/dream-art/blob/master/deepdream.lua#L135-L156, and the function it uses here: https://github.com/bamos/dream-art/blob/master/deepdream.lua#L87-L114

htoyryla commented 7 years ago

@VaKonS cuda vs. double was the reason, it also explains why it worked in th. The error message however was misleading:

/home/hannu/torch/install/share/lua/5.1/image/init.lua:716: attempt to index field 'image' (a nil value)

and looking at the source code of image/init.lua I found that it expects to find an attribute named image which was missing.

Using code like this

   x = image.scale(x:double(), x:size(2)-1, x:size(3)-1, 'bilinear'):cuda()
   print("x", x:size())

in different places it looks like

scaling (at the end of feval) x does not make permanent changes to size so x is probably a clone
scaling (at the beginning of feval) x does indeed crash l-bfgs
scaling img makes no permanent changes to size, so l-bfgs probably overwrites img at the end of each iteration

htoyryla commented 7 years ago

@ProGamerGov of course the tensor can be modified directly, but the example you give only modifies the values in the tensor. Scaling an image is quite a different thing, it can be done of course. Anyway, we know now that the tensor can be scaled, but we cannot make permanent changed because l-bfgs doesn't expose the master copy of the image to us. Also changing its size, if we could do it, would crash l-bfgs anyway.

PS. resizeAs is not the same as scale. It only changes the size of the tensor, not the values. If t is

 1   2   3   4
  5   6   7   8
  9  10  11  12
 13  14  15  16

then resizing it to 6x6 gives

  1.0000e+00   2.0000e+00   3.0000e+00   4.0000e+00   5.0000e+00   6.0000e+00
  7.0000e+00   8.0000e+00   9.0000e+00   1.0000e+01   1.1000e+01   1.2000e+01
  1.3000e+01   1.4000e+01   1.5000e+01   1.6000e+01  7.1145e-322  4.9012e+252
 1.8819e+262  1.0258e+200  6.5257e-308  6.7411e+199  1.8698e+262  1.4878e+195
 6.9972e-308  5.7588e+160  4.3529e-114  1.1712e+166  1.4352e+166  1.2633e-118
 1.3581e+243  1.4365e+166  1.2633e-118  3.0195e+169  5.2241e+257  1.3872e-118

so it has simply filled the larger tensor with was in the memory (1 to 16) and the rest is not initialized at all.

Scaling of an image requires recreating the image in a new size while still retaining the visual appearance as much as possible, by approximating the pixel values e.g. through interpolation.

As to your example

self.clipped:resizeAs(input):clamp(input, -self.max_grad, self.max_grad)

What this does is actually not scaling at all, but making self.clipped the same size as input and filling it with the values from input, clamped between -self.max_grad and self.max_grad (i.e. lower and higher values have been replaced by the -max and max values respectively). In other words, clamping is used here to put a limit to absolute values of gradients.

This use of resizeAs is typical in torch.nn modules where the variables need to be resized according to the input size. .

htoyryla commented 7 years ago

@VaKonS this demonstrates that a cuda tensor lacks image capabilities. The first print statement prints out the image capabilities, the second prints nil. I guess this is a sound development choice, but good to know.

require 'image'
require 'cunn'

img = torch.randn(3,64,64):float():mul(0.001)
print(img.image)
img2 = img:cuda()
print(img2.image)

ProGamerGov commented 7 years ago

@htoyryla So self.clipped:resizeAs(input) in the DeepDreamLoss:updateGradInput function takes the empty tensor created by self.clipped = torch.Tensor() in the DeepDreamLoss:__init function, and resizes it to match the value of input which I assume is the input image in tensor form?

Then :clamp(input, -self.max_grad, self.max_grad) first fills the resized but still empty tensor with the input image, and then -self.max_grad and self.max_grad is used to normalize higher and lower values.

Finally, self.gradInput:add(-self.strength, self.clipped) adds -self.strength to the self.clipped tensor.

Anyway, we know now that the tensor can be scaled, but we cannot make permanent changed because l-bfgs doesn't expose the master copy of the image to us. Also changing its size, if we could do it, would crash l-bfgs anyway.

So performing changes to the image in the feval(x) does not work. Could one use code like you have shown above, to convert the input in either the DeepDreamLoss:updateGradInput or the DeepDreamLoss:updateOutput function, to a non tensor form so that image.scale, and image.translate can be used for the DeepDream operations, and then the resulting output can be reconverted to the tensor form?

Ex: dd_image = image.scale(input:double(), input:size(2)-1, input:size(3)-1, 'bilinear'):cuda()

And then dd_image would be fed into the two DeepDream functions?

ProGamerGov commented 7 years ago

On the subject of spaitial control, I have made progress in my understanding of the process.

In the style loss functions:

The spatially limited tensor is defined like this at the start: local input_chan = input:size()[1]

Then it is added the input value.

   if self.guidance then
        input = torch.cat(input, self.guidance, 1)
    end

The spatial targets are given with this in the python code:

                if guide.ndim==2:
                    guide = guide[:,:,None]
                else:
                    guide = guide[:,:,:1]

Where :,:,None means no guides/spatial targets are used, and :,:,:1 means 1 spatial target is used.

I am still don't understand content vs style guides in his code, and how the code knows which part of the guide image to use.

The 3 spatial targets are used here in the style loss function:

    if self.guidance then
        input = input[{{1,input_chan},{},{}}]
    end

htoyryla commented 7 years ago

@ProGamerGov I'll just answer a few points now.

resizes it to match the value of input which I assume is the input image in tensor form?

The input is the input to the module and depends on where in the model it has been placed. Most nn classes are designed to be general purpose, so that one can stack conv layers, relus etc as long as one obeys certain principles. Also the StyleLoss and ContentLoss modules work this way. For them, the input is the output of the layer below, usually the ReLU of a convlayer.

TVLoss in neural-style is different in that is designed to be placed directly above an image at the input to the whole model. I guess your DeepDream module is intended to be used in the same way, so that the input is really the image. But remember that you cannot change the image directly, only through the gradients you pass to the optimizer, and this can be tricky.

then -self.max_grad and self.max_grad is used to normalize higher and lower values.

This is not really normalization because there is no division involved, one is simply limiting the range.

Finally, self.gradInput:add(-self.strength, self.clipped) adds -self.strength to the self.clipped tensor.

Nope, I think it adds self.clipped (which is a tensor) multiplied by -self.strength (which is a scalar) to self.gradInput.

Could one use code like you have shown above, to convert the input in either the DeepDreamLoss:updateGradInput or the DeepDreamLoss:updateOutput function, to a non tensor form so that image.scale, and image.translate can be used for the DeepDream operations, and then the resulting output can be reconverted to the tensor form?

Ex: dd_image = image.scale(input:double(), input:size(2)-1, input:size(3)-1, 'bilinear'):cuda()

Your question raises many problems, really, but assuming your DeepDream layer is at the bottom of the model, then you can use the input (or a clone of it) as you would an image. There is no conversion into a non-tensor, the input is a tensor, and you can apply image operations to it if you first convert it to float or double. Afterwards, change it back into dtype. My test was using cuda() but you probably want your code to work on CPU as well.

And then dd_image would be fed into the two DeepDream functions?

Which two functions? If you are using input, you are already inside either updateOutput or updateGradInput. If you want to influence the image, your code probably should be in updateGradInput.

OK, now I have commented your post, but I still cannot guarantee that anything will work. For me, these layers that sit directly on top of the image and are influenced through the gradient appear anything but clear. Good luck.

htoyryla commented 7 years ago

As to the spatial control, isn't the idea that masks are used to define parts of the image, so that each area can be processed using a different style. To me it looks like the user is responsible for preparing the masks and assigning the styles to them. The idea seems quite simple to me, use multiple gram matrices each behind its own mask. The idea of a mask is basically very simple, like that a b&w image consisting of black and white pixels can be used to block or pass parts of the input into the gram matrix. (Haven't checked Gatys' code, this is simply how I understood the idea when reading the paper... at least the basic principle).

ProGamerGov commented 7 years ago

@htoyryla I have made great progress, but I have run into a few issues that I don't know how to fix. This is what I have so far: https://gist.github.com/ProGamerGov/7b04fe82d44c3e50b5de195200d1bc0f

There are two errors I am currently facing.

I don't know the cause of this one: https://gist.github.com/ProGamerGov/dd5a703603a9211495652e8f14514d72

This is related to Cuda vs cpu: https://gist.github.com/ProGamerGov/35be22fb4ac452ef303e09590dde487a

This line (L709) when un-commented should work, but it does not. When it's un-commented, I get this error. It is meant be this line of code from the DeepDream.lua code.

This line (L695), also does not work. It is meant to do what this line of code does in the DeepDream.lua code.

htoyryla commented 7 years ago

@ProGamerGov I thought we were discussing about doing some image manipulation inside updateGradInput of a DeepDreamLoss layer, assuming that the layer is placed so it gets the image as an input. Instead I see you have copied a complete DeepDream implementation with its own model, iteration process and everything. Sorry to say, but see no way this could work, makes no sense to me. You are mixing apples with orchards.

Think of it. The DeepDreamLoss module we were discussing is one layer inside a neural network model. The updateGradInput function is run once per each iteration. Inside it you are now loading another full neural network model there and trying to run a full cycle of iterations.

This line (L709) when un-commented should work, but it does not. When it's un-commented, I get this error. It is meant be this line of code from the DeepDream.lua code.

Do you know what a return statement does?

ProGamerGov commented 7 years ago

@htoyryla From my current understanding, I have isolated the code that creates the "DeepDream" hallucinations, and added it into my DeepDream loss functions:

--The DeepDream Magic
      for i=1,iter_n do
        local forward = self.crit:forward(input, self.target)
        -- Set the output gradients at the outermost layer to be equal to the outputs (So they keep getting amplified)
        local forward_grad = forward
        local final = self.crit:updateGradInput(input, forward_grad)
         -- Gradient ascent
        input = input:add(final:mul(step_size/torch.abs(final):mean()))
      end

The code can be found in my modified script here: https://gist.github.com/ProGamerGov/ccc9c41375845f08c4b0f902f251b612#file-neural_style_dd-lua-L642-L649, and source of this variation of the code can be found here in another Lua implementation of DeepDream.

The problem is however I get the following error:

MSECriterion.lua:27: attempt to index local 'target' (a number value)

Full error message: https://gist.github.com/ProGamerGov/aa08b5cdbb7d064f207013014ddebf93

I do not understand the cause of this, as both self.gradInput = self.crit:backward(input, self.target) and self.loss = self.crit:forward(input, self.target) * self.strength work.

jcjohnson / neural-style

Implementing features from the "Controlling Perceptual Factors in Neural Style Transfer" research paper #376