jcjohnson / neural-style

Torch implementation of neural style algorithm
MIT License
18.31k stars 2.7k forks source link

Non-issue: using delta in Gram matrix #325

Open htoyryla opened 8 years ago

htoyryla commented 8 years ago

Yesterday I noticed this paper http://arxiv.org/abs/1606.01286v1 which claims that better results can be obtained by using a modified Gram matrix, using an offset delta which results in a kind of cross correlation. The delta is applied both in x and y dimensions.

As I have already earlier dabbled a bit with modified gram matrices, I have now modified neural-style for this kind of calculation. My experimental code is here https://gist.github.com/htoyryla/b8e7483b713c7f6c9f6006ba8d440c98

If you have already neural-style running, copy neural-delta2.lua to the same directory and that should be enough to run it.

There is a new parameter delta, value 0 should correspond to plain neural-style. Values up to 16 should be safe, beyond that one may get a runtime error depending on which style layers are used. In effect, a slice of width/height delta will by cut of from feature maps, and if delta is too large an error will result.

I have tested this on CPU and GPU, but have not made extensive comparisons of images using different delta values.

delta16a

darkstorm2150 commented 8 years ago

u said values up to 16, is this for a parameter settings inside neural style?, the reason i ask is that i can test it effectively, i just need a small terminal code example ("th neural_style.lua -style_image -content_image ")

i see "cmd:option('-delta', 8)" in the code, is that how its used ? -delta 8

Also, is this related to the learning_rate ?

Thx!

darkstorm2150 commented 8 years ago

nevermind, i figured out how to use it now, -delta command uses more memory lol...

htoyryla commented 8 years ago

26.9.2016 17:48, Victor Espinoza kirjoitti:

u said values up to 16, is this for a parameter settings inside neural style?, the reason i ask is that i can test it effectively, i just need a small terminal code example ("th neural_style.lua -style_image -content_image ")

Normally neural-style uses a single Gram matrix of size CxC, where C is the number of channels in the layer (each channel being of the size HxW, heightxwidth). In this modified method, as described in the paper I refer to, one constructs a modifed matrix of size CxC by making two horizontally cropped matrices of width (C - delta) and multiplying these. One crop by cutting out delta from the left side and one from the right side.

Then one makes the same operation using vertically cropped matrices. Both matrices are used together to calculate style loss instead of the single Gram matrix in the original neural-style. Apparently the use of two matrices per layer instead of one accounts for the increased memory consumption. As far as I know, the method was proposed for quality, not for memory efficiency.

As we are cropping a slice of delta from a matrix the side of which is C, delta cannot be larger than C. Furthermore, with delta larger than C/2 the middle part of the matrix will be lost in the process and have no effect on the results. Unfortunately C depends on the model used and is not obvious when the program is launched. My code is experimental, just wanted to try out the method when I saw it described, so I was not interested in checking the parameter ranges, especially as it would have required much more code to find out the gram matrix dimensions for each layer.

ProGamerGov commented 8 years ago

From the research paper:

Nevertheless, to give an approximate figure, the Gatys et al. generated images shown in the figures have been obtained within 8 minutes on a GTX750Ti. Adding cross-correlation terms increases this by roughly 25%

So it takes approximately 25% longer and uses more resources, but seems to copy symmetry from the style image better?

htoyryla commented 8 years ago

26.9.2016 19:28, ProGamerGov kirjoitti:

From the research paper:

Nevertheless, to give an approximate figure, the Gatys et al.
generated images shown in the figures have been obtained within 8
minutes on a GTX750Ti. Adding cross-correlation terms increases
this by roughly 25%

So it takes approximately 25% longer and uses more resources, but seems to copy symmetry from the style image better?

I understand the paper so that here they are talking about using multiple delta values per layer, which is more like real cross-correlation. My code only uses a single value of delta, as described earlier in the paper. On the other hand, I would expect that the memory and speed penalties for using multiple delta would be larger, unless there is some neat way to optimize the process --not really my area of expertise.

crowsonkb commented 8 years ago

Is it possible to add together the two/many Gram matrices per layer - adjusting for the fact that the 1/(4 M^2 N^2) factor will be different for different sized feature maps - to save memory?

ProGamerGov commented 8 years ago

When trying out the script, I seem to be getting this error:

Setting up style layer          2       :       relu1_1
/home/ubuntu/torch/install/bin/luajit: /home/ubuntu/torch/install/share/lua/5.1/nn/Narrow.lua:14: bad argument #4 to 'narrow' (out of range at /tmp/luarocks_cutorch-scm-1-6766/cutorch/lib/THC/generic/THCTensor.c:367)
stack traceback:
        [C]: in function 'narrow'
        /home/ubuntu/torch/install/share/lua/5.1/nn/Narrow.lua:14: in function 'updateOutput'
        /home/ubuntu/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'updateOutput'
        /home/ubuntu/torch/install/share/lua/5.1/nn/ConcatTable.lua:11: in function 'updateOutput'
        /home/ubuntu/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
        neural-delta2.lua:219: in function 'main'
        neural-delta2.lua:565: in main chunk
        [C]: in function 'dofile'
        ...untu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
        [C]: at 0x00406670
ubuntu@ip-Address:~/neural-style$
htoyryla commented 8 years ago

26.9.2016 23:38, ProGamerGov kirjoitti:

When trying out the script, I seem to be getting this error:

Setting up style layer 2 : relu1_1 /home/ubuntu/torch/install/bin/luajit: /home/ubuntu/torch/install/share/lua/5.1/nn/Narrow.lua:14: bad argument #4 to 'narrow' (out of range at /tmp/luarocks_cutorch-scm-1-6766/cutorch/lib/THC/generic/THCTensor.c:367) stack traceback: [C]: in function 'narrow' /home/ubuntu/torch/install/share/lua/5.1/nn/Narrow.lua:14: in function 'updateOutput' /home/ubuntu/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'updateOutput' /home/ubuntu/torch/install/share/lua/5.1/nn/ConcatTable.lua:11: in function 'updateOutput' /home/ubuntu/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward' neural-delta2.lua:219: in function 'main' neural-delta2.lua:565: in main chunk [C]: in function 'dofile' ...untu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00406670 ubuntu@ip-Address:~/neural-style$

|I can't check it where I am now, but the error looks to me exactly what you get when the delta is too large for the Gram matrix. Try a smaller delta. You are using relu1_1 as a style layer, that has the smallest number of channels, therefore the smallest Gram matrix. When one gets an error, the delta is too large for the Gram matrix on the particular style layer. |

htoyryla commented 8 years ago

26.9.2016 23:38, ProGamerGov kirjoitti:

When trying out the script, I seem to be getting this error:

|Setting up style layer 2 : relu1_1|

|Thinking further on this... isn't it that relu1_1, being the lowest layer, has only three channels, so the only delta value that would work is 3 and even that does not make much sense. This method actually only makes sense with layers with a higher number of channels.

ProGamerGov commented 8 years ago

@htoyryla The error occurs when using any parameters with th neural-delta2.lua and -delta 0.

htoyryla commented 8 years ago

26.9.2016 23:56, ProGamerGov kirjoitti:

@htoyryla https://github.com/htoyryla The error occurs when using any parameters with |th neural-delta2.lua| and |-delta 0|.

I can't check anything right now, I am away from my computers and it is over midnight here.

I suspect that this is related to my use of negative length parameter in Torch.nn.Narrow which allows slicing a Tensor without knowing its size. I am not sure how new this feature is in Torch.nn.

Seeing that others have been able to run it, I recommend updating to newest Torch and nn (etc.) if one gets this error also with a small delta.

htoyryla commented 7 years ago

"Is it possible to add together the two/many Gram matrices per layer - adjusting for the fact that the 1/(4 M^2 N^2) factor will be different for different sized feature maps - to save memory?"

I tried it just now (simply the adding the two matrices without any adjustment):

function Gram()
  local net = nn.Sequential()
  local net1 = nn.Sequential()
  local net2 = nn.Sequential()

  local concat1 = nn.ConcatTable()
  local seq1 = nn.Sequential()
  local seq2 = nn.Sequential()
  seq1:add(nn.Narrow(2,1+delta,-1))
  seq1:add(nn.View(-1):setNumInputDims(2))
  seq2:add(nn.Narrow(2,1,-1-delta))
  seq2:add(nn.View(-1):setNumInputDims(2))
  concat1:add(seq1)
  concat1:add(seq2)
  net1:add(concat1)
  net1:add(nn.MM(false, true))

  local concat2 = nn.ConcatTable()
  local seq1b = nn.Sequential()
  local seq2b = nn.Sequential()
  seq1b:add(nn.Narrow(3,1+delta,-1))
  seq1b:add(nn.View(-1):setNumInputDims(2))
  seq2b:add(nn.Narrow(3,1,-1-delta))
  seq2b:add(nn.View(-1):setNumInputDims(2))
  concat2:add(seq1b)
  concat2:add(seq2b)
  net2:add(concat2)
  net2:add(nn.MM(false, true))

  local concat = nn.ConcatTable()
  concat:add(net1)
  concat:add(net2)
  net:add(concat)
  net:add(nn.CAddTable(true))

  return net
end

There is no difference in memory usage. I guess this is because what is stored in memory is not only the output value but the whole network needed in the calculation. This is seen from the function above: it returns a network to calculate the matrix and this network (for each layer) is added to the main network.

As to the adjustment you suggested... the matrices added here all belong to the same layer, so I don't see a need to adjust anything for the adding here.

I know that there are other implementations that compensate for the Gram matrix size when calculating style losses; you might mean this although it is not directly relevant for the delta method. In general that might be a good practice and I intend to try it at some point.

ProGamerGov commented 7 years ago

@htoyryla What do I replace in the script with the dual matrix code to make it use two matrices?

ProGamerGov commented 7 years ago

The results of my delta experiments today. I was able to get up to a delta value of 39.

htoyryla commented 7 years ago

"What do I replace in the script with the dual matrix code to make it use two matrices?"

I am not sure I understand what you want. Here is the full code in which both matrices are handled within a single Gram() function https://gist.github.com/htoyryla/cd0e75ab148a8e526b58e24826ce23a4 . But it should be functionally equivalent to the first version, only the code is cleaner now (the original neural-delta2.lua had hacks all over the code because of the two Gram() functions). Both versions use two matrices (one using horizontal crops, one with vertical crops), the difference is in implementation.

htoyryla commented 7 years ago

The maximum for delta depends on image size and the layers used. The code is cutting away slices of delta from each feature map, which will fail when delta reaches the width or height of the feature map. The size of the feature map is dependent on the image size and decreases with the level of the layer.

Furthermore, when the delta is more than half its maximum, a part in the center of the feature map will not be used at all. This is because the crops will result in narrow slices along the edges, leaving out the center. However, as the feature maps for different layers are of different size, this will not happen on all layers.

palashd11 commented 7 years ago

I'm getting the following error:

WARNING: Skipping content loss
Running optimization with L-BFGS

creating recyclable direction/step/history buffers THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-6878/cutorch/lib/THC/generic/THCStorage.cu line=40 error=2 : out of memory /home/iki/torch/install/bin/luajit: /home/iki/torch/install/share/lua/5.1/optim/lbfgs.lua:84: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-6878/cutorch/lib/THC/generic/THCStorage.cu:40 stack traceback: [C]: in function 'new' /home/iki/torch/install/share/lua/5.1/optim/lbfgs.lua:84: in function 'lbfgs' neural-delta2.lua:364: in function 'main' neural-delta2.lua:565: in main chunk [C]: in function 'dofile' .../iki/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00405d50
htoyryla commented 7 years ago

You are running out of memory. The modified Gram matrices use more memory than the original neural-style. You could try the usual things: smaller image_size, adam instead of l-bfgs or using nin-imagenet-conv instead of vgg19.

palashd11 commented 7 years ago

I'll try that Thank you.

Naruto-Sasuke commented 7 years ago

This paper was refused by NIPS2016...

htoyryla commented 7 years ago

Not that it matters to me... I am constantly experimenting with things which have not been approved by any committee :) If it works somehow that's good, if it doesn't, it might still produce something interesting, or anyway I'll usually learn something new.

But still... what were the reasons for refusal? Some fundamental faults? Not interesting enough?