Size mismatch running VGG network

sjmarotta commented 8 years ago

I'm running neural-style on a Kubuntu 15.04 VM in VMware on a Windows host. I can run the NIN without any trouble, but when I try to run the VGG network, I get a size mismatch. Here is what I get:

smarotta@ubuntu:~/rnn/neural-style$ th neural_style.lua -style_image examples/inputs/picasso_selfport1907.jpg -content_image examples/inputs/brad_pitt.jpg -output_image profile.png -model_file models/VGG_ILSVRC_19_layers.caffemodel -proto_file models/VGG_ILSVRC_19_layers_deploy.prototxt -gpu -1 -num_iterations 1000 -seed 123 -content_layers relu0,relu3,relu7,relu12 -style_layers relu0,relu3,relu7,relu12 -content_weight 10 -style_weight 1000 -image_size 512 -optimizer adam [libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message. If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h. [libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 574671192 Successfully loaded models/VGG_ILSVRC_19_layers.caffemodel conv1_1: 64 3 3 3 conv1_2: 64 64 3 3 conv2_1: 128 64 3 3 conv2_2: 128 128 3 3 conv3_1: 256 128 3 3 conv3_2: 256 256 3 3 conv3_3: 256 256 3 3 conv3_4: 256 256 3 3 conv4_1: 512 256 3 3 conv4_2: 512 512 3 3 conv4_3: 512 512 3 3 conv4_4: 512 512 3 3 conv5_1: 512 512 3 3 conv5_2: 512 512 3 3 conv5_3: 512 512 3 3 conv5_4: 512 512 3 3 fc6: 1 1 25088 4096 fc7: 1 1 4096 4096 fc8: 1 1 4096 1000 /home/smarotta/rnn/torch/install/bin/luajit: ...marotta/rnn/torch/install/share/lua/5.1/nn/Container.lua:69: In 40 module of nn.Sequential: ...e/smarotta/rnn/torch/install/share/lua/5.1/nn/Linear.lua:51: size mismatch, [4096 x 25088], [106496] at /home/smarotta/rnn/torch/pkg/torch/lib/TH/generic/THTensorMath.c:672 stack traceback: [C]: in function 'addmv' ...e/smarotta/rnn/torch/install/share/lua/5.1/nn/Linear.lua:51: in function <...e/smarotta/rnn/torch/install/share/lua/5.1/nn/Linear.lua:47> [C]: in function 'xpcall' ...marotta/rnn/torch/install/share/lua/5.1/nn/Container.lua:65: in function 'rethrowErrors' ...arotta/rnn/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward' neural_style.lua:265: in function 'main' neural_style.lua:500: in main chunk [C]: in function 'dofile' .../rnn/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00405ea0

WARNING: If you see a stack trace below, it doesn't point to the place where this error occured. Please use only the one above. stack traceback: [C]: in function 'error' ...marotta/rnn/torch/install/share/lua/5.1/nn/Container.lua:69: in function 'rethrowErrors' ...arotta/rnn/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward' neural_style.lua:265: in function 'main' neural_style.lua:500: in main chunk [C]: in function 'dofile' .../rnn/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00405ea0

jcjohnson commented 8 years ago

You are asking for style and content layers that don't exist. For VGG you should be able to just use the default content and style layers.

On Wednesday, March 23, 2016, Steve Marotta notifications@github.com wrote:

I'm running neural-style on a Kubuntu 15.04 VM in VMware on a Windows host. I can run the NIN without any trouble, but when I try to run the VGG network, I get a size mismatch. Here is what I get:

smarotta@ubuntu:~/rnn/neural-style$ th neural_style.lua -style_image examples/inputs/picasso_selfport1907.jpg -content_image examples/inputs/brad_pitt.jpg -output_image profile.png -model_file models/VGG_ILSVRC_19_layers.caffemodel -proto_file models/VGG_ILSVRC_19_layers_deploy.prototxt -gpu -1 -num_iterations 1000 -seed 123 -content_layers relu0,relu3,relu7,relu12 -style_layers relu0,relu3,relu7,relu12 -content_weight 10 -style_weight 1000 -image_size 512 -optimizer adam [libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message. If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h. [libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 574671192 Successfully loaded models/VGG_ILSVRC_19_layers.caffemodel conv1_1: 64 3 3 3 conv1_2: 64 64 3 3 conv2_1: 128 64 3 3 conv2_2: 128 128 3 3 conv3_1: 256 128 3 3 conv3_2: 256 256 3 3 conv3_3: 256 256 3 3 conv3_4: 256 256 3 3 conv4_1: 512 256 3 3 conv4_2: 512 512 3 3 conv4_3: 512 512 3 3 conv4_4: 512 512 3 3 conv5_1: 512 512 3 3 conv5_2: 512 512 3 3 conv5_3: 512 512 3 3 conv5_4: 512 512 3 3 fc6: 1 1 25088 4096 fc7: 1 1 4096 4096 fc8: 1 1 4096 1000 /home/smarotta/rnn/torch/install/bin/luajit: ...marotta/rnn/torch/install/share/lua/5.1/nn/Container.lua:69: In 40 module of nn.Sequential: ...e/smarotta/rnn/torch/install/share/lua/5.1/nn/Linear.lua:51: size mismatch, [4096 x 25088], [106496] at /home/smarotta/rnn/torch/pkg/torch/lib/TH/generic/THTensorMath.c:672 stack traceback: [C]: in function 'addmv' ...e/smarotta/rnn/torch/install/share/lua/5.1/nn/Linear.lua:51: in function <...e/smarotta/rnn/torch/install/share/lua/5.1/nn/Linear.lua:47> [C]: in function 'xpcall' ...marotta/rnn/torch/install/share/lua/5.1/nn/Container.lua:65: in function 'rethrowErrors' ...arotta/rnn/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward' neural_style.lua:265: in function 'main' neural_style.lua:500: in main chunk [C]: in function 'dofile' .../rnn/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00405ea0

WARNING: If you see a stack trace below, it doesn't point to the place where this error occured. Please use only the one above. stack traceback: [C]: in function 'error' ...marotta/rnn/torch/install/share/lua/5.1/nn/Container.lua:69: in function 'rethrowErrors' ...arotta/rnn/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward' neural_style.lua:265: in function 'main' neural_style.lua:500: in main chunk [C]: in function 'dofile' .../rnn/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00405ea0

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/jcjohnson/neural-style/issues/191

sjmarotta commented 8 years ago

Thanks! Now I'm able to get the large VGG network to run. I tried running the normalized vgg network with the included VGG proto file, and so far, my 100-iteration image appears to be a blank grey image. Is there a different proto file I should be using with the normalized network?

htoyryla commented 8 years ago

Steve Marotta notifications@github.com kirjoitti 23.3.2016 kello 22.43:

Thanks! Now I'm able to get the large VGG network to run. I tried running the normalized vgg network with the included VGG proto file, and so far, my 100-iteration image appears to be a blank grey image. Is there a different proto file I should be using with the normalized network?

Try larger content and style weights. Recently I got good results with content_weight 1e4 and style_weight between 1e7 and 1e10.

In addition, I used average pooling with a prototxt file specifically for average pooling, which I found somewhere, but I think it is not necessary because as far as I can see neural-style ignores the pooling type in the prototxt file.

Hannu

sjmarotta commented 8 years ago

Ah, interesting. I was wondering whether the content and style weights were just relative to each other (i.e., cw=1, sw=10 is the same as cw=100, sw=1000). Does their absolute value make a difference, or do I get the same results with those two sets of values?

htoyryla commented 8 years ago

Steve Marotta notifications@github.com kirjoitti 23.3.2016 kello 22.58:

Ah, interesting. I was wondering whether the content and style weights were just relative to each other (i.e., cw=1, sw=10 is the same as cw=100, sw=1000). Does their absolute value make a difference, or do I get the same results with those two sets of values?

The absolute values of the weight really seem to matter. With the normalized VGG with the default weights, the loss values displayed are quite low, especially the style losses, and you get a grey image. When you increase the weights, so that the losses are larger (thousands and upwards) you start getting real images. I guess that the optimizer is sensitive to the absolute value in order to effectively find a good optimum.

Hannu

htoyryla commented 8 years ago

Steve Marotta notifications@github.com kirjoitti 23.3.2016 kello 22.58:

Ah, interesting. I was wondering whether the content and style weights were just relative to each other (i.e., cw=1, sw=10 is the same as cw=100, sw=1000). Does their absolute value make a difference, or do I get the same results with those two sets of values

I only now noticed that you have used ADAM. My values and comments were for L-BFGS, so the optimal values for ADAM can be different.

Hannu

jcjohnson commented 8 years ago

Also if you are getting a gray image then try turning off TV regularization.

On Wed, Mar 23, 2016 at 5:13 PM, Hannu Töyrylä notifications@github.com wrote:

Steve Marotta notifications@github.com kirjoitti 23.3.2016 kello 22.58:

Ah, interesting. I was wondering whether the content and style weights were just relative to each other (i.e., cw=1, sw=10 is the same as cw=100, sw=1000). Does their absolute value make a difference, or do I get the same results with those two sets of values

I only now noticed that you have used ADAM. My values and comments were for L-BFGS, so the optimal values for ADAM can be different.

Hannu

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/jcjohnson/neural-style/issues/191#issuecomment-200545497

sjmarotta commented 8 years ago

I've switched over to L-BFGS. I'm running neural_style with the full VGG network and cw=1, sw=10, and I'm getting some interesting results. They're not quite as crisp as I'm seeing in some of the examples on the flickr gallery and elsewhere, though.

I tried another example with the full VGG network, and when I set cw=10,sw=1000, I got an extremely noisy, mostly grey image and very high loss values (iteration 100 loss=1900474642.898254). Are cw/sw values around 1-10 more appropriate for the non-normalized VGG network?

htoyryla commented 8 years ago

Steve Marotta notifications@github.com kirjoitti 23.3.2016 kello 23.49:

I've switched over to L-BFGS. I'm running neural_style with the full VGG network and cw=1, sw=10, and I'm getting some interesting results. They're not quite as crisp as I'm seeing in some of the examples on the flickr gallery and elsewhere, though.

If you mean the default (non-normalized) VGG19, I would recommend starting from the default weights, they are a good starting point for experimenting. I tried another example with the full VGG network, and when I set cw=10,sw=1000, I got an extremely noisy, mostly grey image and very high loss values (iteration 100 loss=1900474642.898254). Are cw/sw values around 1-10 more appropriate for the non-normalized VGG network?

To me 1 to 10 sounds very low, I usually use higher values, starting from the defaults, most often tuning style upwards and adjusting content up or down until I get the right result. Usually at iteration 100 the losses are already going down and a picture is emerging, still vague or noisy. Sometimes it happens that the image starts to emerge only at iteration 300 (might have been a different network though). It very much depends how well the optimizer finds a way towards an minimum.

I'll run a test using the default networks and images with cw/sw=10/1000 and report the results.

Hannu

htoyryla commented 8 years ago

Steve Marotta notifications@github.com kirjoitti 23.3.2016 kello 23.49:

I tried another example with the full VGG network, and when I set cw=10,sw=1000, I got an extremely noisy, mostly grey image and very high loss values (iteration 100 loss=1900474642.898254). Are cw/sw values around 1-10 more appropriate for the non-normalized VGG network?

See http://liipetti.net/erratic/vgg19-defaults-with-different-weights/ for my test run with the default VGG19, default images, cw=10 and sw=1000.

The losses diminish quite nicely and the picture emerges as I would expect. Each run may of course produce different results, so if one run fails to converge a second try may succeed.

Hannu

htoyryla commented 8 years ago

By the way, in my experiments pushing neural-style to the extremes, I wanted to try using an FC layer for content, assuming that they contain activations on features detected in the original image. Interestingly, I am getting pretty much the same error as in the first posting in this issue. I assume that the layers do exist and there should not be a size mismatch if the losses are computed from the same layer. Apparently there is something going on that I have not yet grasped. Some assumptions on the dimensions of the layers, which are not immediately obvious? @jcjohnson, can you comment?

jcjohnson commented 8 years ago

The size mismatch comes when you try to compute a forward pass of a fully connected layer on an image of a size different from the size the network was trained with.

VGG was trained on 224x224 images, so you'll get a size mismatch on FC layers for other sizes. It would theoretically be possible to get around this by converting the FC layer to a conv layer but that is slightly annoying to do.

On Thursday, March 24, 2016, Hannu Töyrylä notifications@github.com wrote:

By the way, in my experiments pushing neural-style to the extremes, I wanted to try using an FC layer for content, assuming that they contain activations on features detected in the original image. Interestingly, I am getting pretty much the same error as in the first posting in this issue. I assume that the layers do exist and there should not be a size mismatch if the losses are computed from the same layer. Apparently there is something going on that I have not yet grasped. @jcjohnson https://github.com/jcjohnson, can you comment?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/jcjohnson/neural-style/issues/191#issuecomment-200770626

htoyryla commented 8 years ago

Justin notifications@github.com kirjoitti 24.3.2016 kello 14.49:

The size mismatch comes when you try to compute a forward pass of a fully connected layer on an image of a size different from the size the network was trained with.

VGG was trained on 224x224 images, so you'll get a size mismatch on FC layers for other sizes. It would theoretically be possible to get around this by converting the FC layer to a conv layer but that is slightly annoying to do.

Thanks. I was thinking that the loss modules would expect a convolution layer but the connection to the 224x224 images is new to me. Must have been something similar when I attempted to change the size of training images in one experiment to train a network. OK, I will learn bit byt bit, neural-style has really drawn me into experimenting with neural networks.

Converting images beforehand to 224x224 and setting image_size to 224 really makes it run, so I can make some experiments that way. ...except that the network will be missing some layers still and the results are quite boring. Anyway, now I am on the map and can work further on my own (see here http://liipetti.net/erratic/2016/03/28/controlling-image-content-with-fc-layers/ ) Hannu

jcjohnson / neural-style

Size mismatch running VGG network #191