slow_neural_style.lua: size mismatch

MrZoidberg commented 7 years ago

I'm trying to run slow_neural_style.lua on my Windows PC on CPU.

LuaJIT 2.1.0-beta2

Installed rocks:

argcheck scm-1 (installed) - C:/tools/torch/luarocks/systree/lib/luarocks/rocks

cudnn scm-1 (installed) - C:/tools/torch/luarocks/systree/lib/luarocks/rocks

cunn scm-1 (installed) - C:/tools/torch/luarocks/systree/lib/luarocks/rocks

cwrap scm-1 (installed) - C:/tools/torch/luarocks/systree/lib/luarocks/rocks

dok scm-1 (installed) - C:/tools/torch/luarocks/systree/lib/luarocks/rocks

env scm-1 (installed) - C:/tools/torch/luarocks/systree/lib/luarocks/rocks

gnuplot scm-1 (installed) - C:/tools/torch/luarocks/systree/lib/luarocks/rocks

graph scm-1 (installed) - C:/tools/torch/luarocks/systree/lib/luarocks/rocks

image 1.1.alpha-0 (installed) - C:/tools/torch/luarocks/systree/lib/luarocks/rocks

lua-cjson 2.1devel-1 (installed) - C:/tools/torch/luarocks/systree/lib/luarocks/rocks

luaffi scm-1 (installed) - C:/tools/torch/luarocks/systree/lib/luarocks/rocks

luafilesystem 1.6.3-1 (installed) - C:/tools/torch/luarocks/systree/lib/luarocks/rocks

nn scm-1 (installed) - C:/tools/torch/luarocks/systree/lib/luarocks/rocks

nngraph scm-1 (installed) - C:/tools/torch/luarocks/systree/lib/luarocks/rocks

nnx 0.1-1 (installed) - C:/tools/torch/luarocks/systree/lib/luarocks/rocks

optim 1.0.5-0 (installed) - C:/tools/torch/luarocks/systree/lib/luarocks/rocks

paths scm-1 (installed) - C:/tools/torch/luarocks/systree/lib/luarocks/rocks

penlight scm-1 (installed) - C:/tools/torch/luarocks/systree/lib/luarocks/rocks

qtlua scm-1 (installed) - C:/tools/torch/luarocks/systree/lib/luarocks/rocks

qttorch scm-1 (installed) - C:/tools/torch/luarocks/systree/lib/luarocks/rocks

sundown scm-1 (installed) - C:/tools/torch/luarocks/systree/lib/luarocks/rocks

sys 1.1-0 (installed) - C:/tools/torch/luarocks/systree/lib/luarocks/rocks

threads scm-1 (installed) - C:/tools/torch/luarocks/systree/lib/luarocks/rocks

torch scm-1 (installed) - C:/tools/torch/luarocks/systree/lib/luarocks/rocks

totem 0-0 (installed) - C:/tools/torch/luarocks/systree/lib/luarocks/rocks

trepl scm-1 (installed) - C:/tools/torch/luarocks/systree/lib/luarocks/rocks

xlua 1.0-0 (installed) - C:/tools/torch/luarocks/systree/lib/luarocks/rocks

Here is the log:

D:\Projects\fast-neural-style>th slow_neural_style.lua -content_image 1.jpg -style_image styles/mosaic.jpg -output_image style1_1.jpg -gpu -1 -save_every 10
nn.Sequential {
  [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> (24) -> (25) -> (26) -> (27) -> (28) -> (29) -> (30) -> (31) -> (32) -> (33) -> (34) -> (35) -> (36) -> (37) -> (38) -> (39) -> (40) -> output]
  (1): nn.SpatialConvolution(3 -> 64, 3x3, 1,1, 1,1)
  (2): nn.ReLU
  (3): nn.SpatialConvolution(64 -> 64, 3x3, 1,1, 1,1)
  (4): nn.ReLU
  (5): nn.SpatialMaxPooling(2x2, 2,2)
  (6): nn.SpatialConvolution(64 -> 128, 3x3, 1,1, 1,1)
  (7): nn.ReLU
  (8): nn.SpatialConvolution(128 -> 128, 3x3, 1,1, 1,1)
  (9): nn.ReLU
  (10): nn.SpatialMaxPooling(2x2, 2,2)
  (11): nn.SpatialConvolution(128 -> 256, 3x3, 1,1, 1,1)
  (12): nn.ReLU
  (13): nn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1)
  (14): nn.ReLU
  (15): nn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1)
  (16): nn.ReLU
  (17): nn.SpatialMaxPooling(2x2, 2,2)
  (18): nn.SpatialConvolution(256 -> 512, 3x3, 1,1, 1,1)
  (19): nn.ReLU
  (20): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
  (21): nn.ReLU
  (22): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
  (23): nn.ReLU
  (24): nn.SpatialMaxPooling(2x2, 2,2)
  (25): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
  (26): nn.ReLU
  (27): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
  (28): nn.ReLU
  (29): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
  (30): nn.ReLU
  (31): nn.SpatialMaxPooling(2x2, 2,2)
  (32): nn.View(-1)
  (33): nn.Linear(25088 -> 4096)
  (34): nn.ReLU
  (35): nn.Dropout(0.500000)
  (36): nn.Linear(4096 -> 4096)
  (37): nn.ReLU
  (38): nn.Dropout(0.500000)
  (39): nn.Linear(4096 -> 1000)
  (40): nn.SoftMax
}
C:\tools\torch\bin\luajit.exe: ...ls\torch\luarocks\systree/share/lua/5.1/nn\Container.lua:67:
In 33 module of nn.Sequential:
C:\tools\torch\luarocks\systree/share/lua/5.1/nn\Linear.lua:66: size mismatch, m1: [1 x 90112], m2: [25088 x 4096] at c:\users\mikhail\appdata\local\temp\luarocks_torch-scm-1-4211\torch7\lib\th\generic/THTensorMath.c:816
stack traceback:
        [C]: in function 'addmm'
        C:\tools\torch\luarocks\systree/share/lua/5.1/nn\Linear.lua:66: in function <C:\tools\torch\luarocks\systree/share/lua/5.1/nn\Linear.lua:53>
        [C]: in function 'xpcall'
        ...ls\torch\luarocks\systree/share/lua/5.1/nn\Container.lua:63: in function 'rethrowErrors'
        ...s\torch\luarocks\systree/share/lua/5.1/nn\Sequential.lua:44: in function 'forward'
        .\fast_neural_style\PerceptualCriterion.lua:92: in function 'setContentTarget'
        slow_neural_style.lua:105: in function 'main'
        slow_neural_style.lua:172: in main chunk
        [C]: in function 'dofile'
        ...h\luarocks\systree\lib\luarocks\rocks\trepl\scm-1\bin\th:145: in main chunk
        [C]: at 0x7ff7fed91eb0

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
        [C]: in function 'error'
        ...ls\torch\luarocks\systree/share/lua/5.1/nn\Container.lua:67: in function 'rethrowErrors'
        ...s\torch\luarocks\systree/share/lua/5.1/nn\Sequential.lua:44: in function 'forward'
        .\fast_neural_style\PerceptualCriterion.lua:92: in function 'setContentTarget'
        slow_neural_style.lua:105: in function 'main'
        slow_neural_style.lua:172: in main chunk
        [C]: in function 'dofile'
        ...h\luarocks\systree\lib\luarocks\rocks\trepl\scm-1\bin\th:145: in main chunk
        [C]: at 0x7ff7fed91eb0

What am I doing wrong?

Runescaped commented 7 years ago

I don't know what the issue is... but I believe that Windows isn't officially supported, and that the preferred OS is Ubuntu 14.04. I would suggest installing Ubuntu on a virtual machine, if you are only planning to use the CPU...

MrZoidberg commented 7 years ago

@Runescaped I've tried the same code on not virtual Ubuntu and it works. Since my primary OS is Windows, I just thought someone already faced and fixed the same problem

Runescaped commented 7 years ago

Ah, that makes sense.

Does this issue occur with every combination of images you try? Or only for those two images (1.jpg, styles/mosaic.jpg)?

MrZoidberg commented 7 years ago

for every images I've tried.

htoyryla commented 7 years ago

The error occurs on a Linear layer which should not be in use at all. Linear layers do not adapt to image size like Convolutional layers, so attempts to use linear layers will usually result in size mismatches (unless one modifies the network like I did in http://liipetti.net/erratic/2016/03/28/controlling-image-content-with-fc-layers/ ).

So it looks like that a linear layer (which exists in the VGG model) gets being used while it should not. I am not thoroughly familiar with this new slow_neural_style but the first thing to check would be that a linear layer is not selected as a style or content layer. Why would this happen? I don't know, but somehow the Linear layer gets used while it should not.

MrZoidberg commented 7 years ago

Cool, thanks! I'll try it in a few days when I get back to Windows)

htoyryla commented 7 years ago

By the way, if I in Ubuntu set

-content_layers 33

I also get this error (so check your layer numbers)

/home/hannu/torch/install/bin/luajit: /home/hannu/torch/install/share/lua/5.1/nn/Container.lua:67:
In 37 module of nn.Sequential:
/home/hannu/torch/install/share/lua/5.1/nn/Linear.lua:66: size mismatch, m1: [1 x 90112], m2: [25088 x 4096] at /tmp/luarocks_torch-scm-1-1123/torch7/lib/TH/generic/THTensorMath.c:816
stack traceback:
    [C]: in function 'addmm'
    /home/hannu/torch/install/share/lua/5.1/nn/Linear.lua:66: in function </home/hannu/torch/install/share/lua/5.1/nn/Linear.lua:53>
    [C]: in function 'xpcall'
    /home/hannu/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
    /home/hannu/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
    ./fast_neural_style/PerceptualCriterion.lua:92: in function 'setContentTarget'
    slow_neural_style.lua:105: in function 'main'
    slow_neural_style.lua:172: in main chunk
    [C]: in function 'dofile'
    ...annu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00406670

MrZoidberg commented 7 years ago

I'm sure that I haven't used custom layers at that time, so I think I'll put some debugging messages to see if this happens on Windows. Thanks for help!

htoyryla commented 7 years ago

The first place to look at is https://github.com/jcjohnson/fast-neural-style/blob/master/fast_neural_style/PerceptualCriterion.lua where the init function prepares the net to use the correct layers for measuring loss.

If the layer settings are correct, it could be that trim_network() somehow fails, it is supposed to remove any layers above the highest one used. See https://github.com/jcjohnson/fast-neural-style/blob/master/fast_neural_style/layer_utils.lua .

Inserting print(self.net) here https://github.com/jcjohnson/fast-neural-style/blob/master/fast_neural_style/PerceptualCriterion.lua#L63 prints out the net after the trimming, and the Linear layers should not be there.

  layer_utils.trim_network(self.net)
  print(self.net)
  self.grad_net_output = torch.Tensor()

MrZoidberg commented 7 years ago

@htoyryla looks like you're right and there are wrong layers.

The difference between my code and original code is that I've fixed model load for Windows: local ok, checkpoint = pcall(function() return torch.load(opt.model, 'b64') end) (added 'b64'), but I don't think it could be a problem. Also I have updated an image lua module (script doesn't work after image update on my Ubuntu), but I also don't believe these problems are connected. Is there any tools besides print for troubleshooting?

D:\Projects\fast-neural-style>th slow_neural_style.lua -content_image images/content/chicago.jpg -style_image styles/mosaic.jpg -output_image style1_1.jpg -gpu 0 -backend cudnn -save_every 10
nn.Sequential {
  [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> (24) -> (25) -> (26) -> (27) -> (28) -> (29) -> (30) -> (31) -> (32) -> (33) -> (34) -> (35) -> (36) -> (37) -> (38) -> (39) -> (40) -> output]
  (1): nn.SpatialConvolution(3 -> 64, 3x3, 1,1, 1,1)
  (2): nn.ReLU
  (3): nn.SpatialConvolution(64 -> 64, 3x3, 1,1, 1,1)
  (4): nn.ReLU
  (5): nn.SpatialMaxPooling(2x2, 2,2)
  (6): nn.SpatialConvolution(64 -> 128, 3x3, 1,1, 1,1)
  (7): nn.ReLU
  (8): nn.SpatialConvolution(128 -> 128, 3x3, 1,1, 1,1)
  (9): nn.ReLU
  (10): nn.SpatialMaxPooling(2x2, 2,2)
  (11): nn.SpatialConvolution(128 -> 256, 3x3, 1,1, 1,1)
  (12): nn.ReLU
  (13): nn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1)
  (14): nn.ReLU
  (15): nn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1)
  (16): nn.ReLU
  (17): nn.SpatialMaxPooling(2x2, 2,2)
  (18): nn.SpatialConvolution(256 -> 512, 3x3, 1,1, 1,1)
  (19): nn.ReLU
  (20): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
  (21): nn.ReLU
  (22): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
  (23): nn.ReLU
  (24): nn.SpatialMaxPooling(2x2, 2,2)
  (25): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
  (26): nn.ReLU
  (27): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
  (28): nn.ReLU
  (29): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
  (30): nn.ReLU
  (31): nn.SpatialMaxPooling(2x2, 2,2)
  (32): nn.View(-1)
  (33): nn.Linear(25088 -> 4096)
  (34): nn.ReLU
  (35): nn.Dropout(0.500000)
  (36): nn.Linear(4096 -> 4096)
  (37): nn.ReLU
  (38): nn.Dropout(0.500000)
  (39): nn.Linear(4096 -> 1000)
  (40): nn.SoftMax
}
nn.Sequential {
  [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> (24) -> (25) -> (26) -> (27) -> (28) -> (29) -> (30) -> (31) -> (32) -> (33) -> (34) -> (35) -> (36) -> (37) -> (38) -> (39) -> (40) -> output]
  (1): nn.SpatialConvolution(3 -> 64, 3x3, 1,1, 1,1)
  (2): nn.ReLU
  (3): nn.SpatialConvolution(64 -> 64, 3x3, 1,1, 1,1)
  (4): nn.ReLU
  (5): nn.SpatialMaxPooling(2x2, 2,2)
  (6): nn.SpatialConvolution(64 -> 128, 3x3, 1,1, 1,1)
  (7): nn.ReLU
  (8): nn.SpatialConvolution(128 -> 128, 3x3, 1,1, 1,1)
  (9): nn.ReLU
  (10): nn.SpatialMaxPooling(2x2, 2,2)
  (11): nn.SpatialConvolution(128 -> 256, 3x3, 1,1, 1,1)
  (12): nn.ReLU
  (13): nn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1)
  (14): nn.ReLU
  (15): nn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1)
  (16): nn.ReLU
  (17): nn.SpatialMaxPooling(2x2, 2,2)
  (18): nn.SpatialConvolution(256 -> 512, 3x3, 1,1, 1,1)
  (19): nn.ReLU
  (20): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
  (21): nn.ReLU
  (22): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
  (23): nn.ReLU
  (24): nn.SpatialMaxPooling(2x2, 2,2)
  (25): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
  (26): nn.ReLU
  (27): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
  (28): nn.ReLU
  (29): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
  (30): nn.ReLU
  (31): nn.SpatialMaxPooling(2x2, 2,2)
  (32): nn.View(-1)
  (33): nn.Linear(25088 -> 4096)
  (34): nn.ReLU
  (35): nn.Dropout(0.500000)
  (36): nn.Linear(4096 -> 4096)
  (37): nn.ReLU
  (38): nn.Dropout(0.500000)
  (39): nn.Linear(4096 -> 1000)
  (40): nn.SoftMax
}
C:\tools\torch\bin\luajit.exe: ...ls\torch\luarocks\systree/share/lua/5.1/nn\Container.lua:67:
In 33 module of nn.Sequential:
C:\tools\torch\luarocks\systree/share/lua/5.1/nn\Linear.lua:66: size mismatch, m1: [1 x 90112], m2: [25088 x 4096] at c:\users\mikhail\appdata\local\temp\luarocks_torch-scm-1-4211\torch7\lib\th\generic/THTensorMath.c:816
stack traceback:
        [C]: in function 'addmm'
        C:\tools\torch\luarocks\systree/share/lua/5.1/nn\Linear.lua:66: in function <C:\tools\torch\luarocks\systree/share/lua/5.1/nn\Linear.lua:53>
        [C]: in function 'xpcall'
        ...ls\torch\luarocks\systree/share/lua/5.1/nn\Container.lua:63: in function 'rethrowErrors'
        ...s\torch\luarocks\systree/share/lua/5.1/nn\Sequential.lua:44: in function 'forward'
        .\fast_neural_style\PerceptualCriterion.lua:93: in function 'setContentTarget'
        slow_neural_style.lua:105: in function 'main'
        slow_neural_style.lua:172: in main chunk
        [C]: in function 'dofile'
        ...h\luarocks\systree\lib\luarocks\rocks\trepl\scm-1\bin\th:145: in main chunk
        [C]: at 0x7ff736c41eb0

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
        [C]: in function 'error'
        ...ls\torch\luarocks\systree/share/lua/5.1/nn\Container.lua:67: in function 'rethrowErrors'
        ...s\torch\luarocks\systree/share/lua/5.1/nn\Sequential.lua:44: in function 'forward'
        .\fast_neural_style\PerceptualCriterion.lua:93: in function 'setContentTarget'
        slow_neural_style.lua:105: in function 'main'
        slow_neural_style.lua:172: in main chunk
        [C]: in function 'dofile'
        ...h\luarocks\systree\lib\luarocks\rocks\trepl\scm-1\bin\th:145: in main chunk
        [C]: at 0x7ff736c41eb0

htoyryla commented 7 years ago

If your second printout of the net is after trim_network, then

a) the layers above the topmost content or style layer should have been removed, but they are not

b) StyleLoss and ContentLoss layers should be there as shown below (printout from PerceptualCriterion.lua after calling trim_network.

Somehow the setting up of the network (take VGG16, add loss layers, remove layers above the highest loss layer) is not working correctly. And actually, a follows logically from b: if no loss layers are inserted, nothing will be trimmed either. So one should be looking for why the loss layers fail to get inserted.

I have no idea what could be wrong there, but it should be reasonably straightforward to debug, even using print statements (that's what I have been doing so far). I wonder if it might after all be related to loading the binary model (which is said to be platform specific). But I have no experience of Torch on Windows; no access to Windows at the moment either.

nn.Sequential {
  [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> (24) -> (25) -> (26) -> (27) -> (28) -> output]
  (1): nn.SpatialConvolution(3 -> 64, 3x3, 1,1, 1,1)
  (2): nn.ReLU
  (3): nn.SpatialConvolution(64 -> 64, 3x3, 1,1, 1,1)
  (4): nn.ReLU
  (5): nn.StyleLoss
  (6): nn.SpatialMaxPooling(2x2, 2,2)
  (7): nn.SpatialConvolution(64 -> 128, 3x3, 1,1, 1,1)
  (8): nn.ReLU
  (9): nn.SpatialConvolution(128 -> 128, 3x3, 1,1, 1,1)
  (10): nn.ReLU
  (11): nn.StyleLoss
  (12): nn.SpatialMaxPooling(2x2, 2,2)
  (13): nn.SpatialConvolution(128 -> 256, 3x3, 1,1, 1,1)
  (14): nn.ReLU
  (15): nn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1)
  (16): nn.ReLU
  (17): nn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1)
  (18): nn.ReLU
  (19): nn.StyleLoss
  (20): nn.ContentLoss
  (21): nn.SpatialMaxPooling(2x2, 2,2)
  (22): nn.SpatialConvolution(256 -> 512, 3x3, 1,1, 1,1)
  (23): nn.ReLU
  (24): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
  (25): nn.ReLU
  (26): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
  (27): nn.ReLU
  (28): nn.StyleLoss
}

MrZoidberg commented 7 years ago

I think the problem is that according to the documentation Torch by default saves models in binary format, which is platform dependent. I will try to convert the model file to another format and check it on Windows.

htoyryla commented 7 years ago

Yes, that's why I wrote "I wonder if it might after all be related to loading the binary model (which is said to be platform specific)."

But it might be something else too as the model loading is successful and the loaded model appears to contain the correct layers. The problem (even if it would be because of the binary format) happens most likely in insert_after() in layer_utils.lua which should insert the loss layers but this does not happen. Experience would suggest that maybe there is a type mismatch in a comparison (string against num where it should be num against num) but there is an explicit conversion to num in https://github.com/jcjohnson/fast-neural-style/blob/master/fast_neural_style/layer_utils.lua#L44 . Anyway, for some reason, the loss layer insertion does not happen in Windows.

jcjohnson / fast-neural-style

slow_neural_style.lua: size mismatch #87