Loss values stay the same for every iteration, with extremely large image sizes

ProGamerGov commented 6 years ago

When using -image_size 2432, -image_size 2560, and -image_size 2816, with -backend cudnn, -optimizer adam, and -style_scale 0.5, the loss values seem to remain the same in every iteration. Lower image sizes don't seem to suffer from this issue.

I also used -gpu 0,1,2,3,4,5,6,7 -multigpu_strategy 2,3,4,6,8,11,12, which is the most efficient set of parameters for multiple GPUs that I have come across thus far.

ubuntu@ip-Address:~/neural-style$ ./multires_1.sh
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 538683157
Successfully loaded models/VGG16_SOD_finetune.caffemodel
conv1_1: 64 3 3 3
conv1_2: 64 64 3 3
conv2_1: 128 64 3 3
conv2_2: 128 128 3 3
conv3_1: 256 128 3 3
conv3_2: 256 256 3 3
conv3_3: 256 256 3 3
conv4_1: 512 256 3 3
conv4_2: 512 512 3 3
conv4_3: 512 512 3 3
conv5_1: 512 512 3 3
conv5_2: 512 512 3 3
conv5_3: 512 512 3 3
fc6: 1 1 25088 4096
fc7: 1 1 4096 4096
fc8-SOD100: 1 1 4096 100
Setting up style layer          2       :       relu1_1
Setting up style layer          7       :       relu2_1
Setting up style layer          12      :       relu3_1
Setting up style layer          19      :       relu4_1
Setting up content layer        21      :       relu4_2
Setting up style layer          26      :       relu5_1
Capturing content targets
nn.Sequential {
  [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> output]
  (1): nn.GPU(1) @ nn.Sequential {
    [input -> (1) -> (2) -> output]
    (1): cudnn.SpatialConvolution(3 -> 64, 3x3, 1,1, 1,1)
    (2): cudnn.ReLU
  }
  (2): nn.GPU(2) @ nn.Sequential {
    [input -> (1) -> output]
    (1): nn.StyleLoss
  }
  (3): nn.GPU(3) @ nn.Sequential {
    [input -> (1) -> output]
    (1): cudnn.SpatialConvolution(64 -> 64, 3x3, 1,1, 1,1)
  }
  (4): nn.GPU(4) @ nn.Sequential {
    [input -> (1) -> (2) -> output]
    (1): cudnn.ReLU
    (2): cudnn.SpatialMaxPooling(2x2, 2,2)
  }
  (5): nn.GPU(5) @ nn.Sequential {
    [input -> (1) -> (2) -> output]
    (1): cudnn.SpatialConvolution(64 -> 128, 3x3, 1,1, 1,1)
    (2): cudnn.ReLU
  }
  (6): nn.GPU(6) @ nn.Sequential {
    [input -> (1) -> (2) -> (3) -> output]
    (1): nn.StyleLoss
    (2): cudnn.SpatialConvolution(128 -> 128, 3x3, 1,1, 1,1)
    (3): cudnn.ReLU
  }
  (7): nn.GPU(7) @ nn.Sequential {
    [input -> (1) -> output]
    (1): cudnn.SpatialMaxPooling(2x2, 2,2)
  }
  (8): nn.GPU(8) @ nn.Sequential {
    [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> output]
    (1): cudnn.SpatialConvolution(128 -> 256, 3x3, 1,1, 1,1)
    (2): cudnn.ReLU
    (3): nn.StyleLoss
    (4): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1)
    (5): cudnn.ReLU
    (6): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1)
    (7): cudnn.ReLU
    (8): cudnn.SpatialMaxPooling(2x2, 2,2)
    (9): cudnn.SpatialConvolution(256 -> 512, 3x3, 1,1, 1,1)
    (10): cudnn.ReLU
    (11): nn.StyleLoss
    (12): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
    (13): cudnn.ReLU
    (14): nn.ContentLoss
    (15): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
    (16): cudnn.ReLU
    (17): cudnn.SpatialMaxPooling(2x2, 2,2)
    (18): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
    (19): cudnn.ReLU
    (20): nn.StyleLoss
  }
}
Capturing style target 1
Capturing style target 2
Capturing style target 3
Capturing style target 4
Capturing style target 5
Capturing style target 6
Capturing style target 7
Capturing style target 8
Running optimization with ADAM
Iteration 50 / 200
  Content 1 loss: 1994813.281250
  Style 1 loss: 1589.992940
  Style 2 loss: 2065276.977539
  Style 3 loss: 2789657.592773
  Style 4 loss: 215494.812012
  Style 5 loss: 9914.423704
  Total loss: 7076747.080219
Iteration 100 / 200
  Content 1 loss: 1994813.281250
  Style 1 loss: 1589.992940
  Style 2 loss: 2065276.977539
  Style 3 loss: 2789657.592773
  Style 4 loss: 215494.812012
  Style 5 loss: 9914.423704
  Total loss: 7076747.080219
Iteration 150 / 200
  Content 1 loss: 1994813.281250
  Style 1 loss: 1589.992940
  Style 2 loss: 2065276.977539
  Style 3 loss: 2789657.592773
  Style 4 loss: 215494.812012
  Style 5 loss: 9914.423704
  Total loss: 7076747.080219
Iteration 200 / 200
  Content 1 loss: 1994813.281250
  Style 1 loss: 1589.992940
  Style 2 loss: 2065276.977539
  Style 3 loss: 2789657.592773
  Style 4 loss: 215494.812012
  Style 5 loss: 9914.423704
  Total loss: 7076747.080219
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 538683157
Successfully loaded models/VGG16_SOD_finetune.caffemodel
conv1_1: 64 3 3 3
conv1_2: 64 64 3 3
conv2_1: 128 64 3 3
conv2_2: 128 128 3 3
conv3_1: 256 128 3 3
conv3_2: 256 256 3 3
conv3_3: 256 256 3 3
conv4_1: 512 256 3 3
conv4_2: 512 512 3 3
conv4_3: 512 512 3 3
conv5_1: 512 512 3 3
conv5_2: 512 512 3 3
conv5_3: 512 512 3 3
fc6: 1 1 25088 4096
fc7: 1 1 4096 4096
fc8-SOD100: 1 1 4096 100
Setting up style layer          2       :       relu1_1
Setting up style layer          7       :       relu2_1
Setting up style layer          12      :       relu3_1
Setting up style layer          19      :       relu4_1
Setting up content layer        21      :       relu4_2
Setting up style layer          26      :       relu5_1
Capturing content targets
nn.Sequential {
  [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> output]
  (1): nn.GPU(1) @ nn.Sequential {
    [input -> (1) -> (2) -> output]
    (1): cudnn.SpatialConvolution(3 -> 64, 3x3, 1,1, 1,1)
    (2): cudnn.ReLU
  }
  (2): nn.GPU(2) @ nn.Sequential {
    [input -> (1) -> output]
    (1): nn.StyleLoss
  }
  (3): nn.GPU(3) @ nn.Sequential {
    [input -> (1) -> output]
    (1): cudnn.SpatialConvolution(64 -> 64, 3x3, 1,1, 1,1)
  }
  (4): nn.GPU(4) @ nn.Sequential {
    [input -> (1) -> (2) -> output]
    (1): cudnn.ReLU
    (2): cudnn.SpatialMaxPooling(2x2, 2,2)
  }
  (5): nn.GPU(5) @ nn.Sequential {
    [input -> (1) -> (2) -> output]
    (1): cudnn.SpatialConvolution(64 -> 128, 3x3, 1,1, 1,1)
    (2): cudnn.ReLU
  }
  (6): nn.GPU(6) @ nn.Sequential {
    [input -> (1) -> (2) -> (3) -> output]
    (1): nn.StyleLoss
    (2): cudnn.SpatialConvolution(128 -> 128, 3x3, 1,1, 1,1)
    (3): cudnn.ReLU
  }
  (7): nn.GPU(7) @ nn.Sequential {
    [input -> (1) -> output]
    (1): cudnn.SpatialMaxPooling(2x2, 2,2)
  }
  (8): nn.GPU(8) @ nn.Sequential {
    [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> output]
    (1): cudnn.SpatialConvolution(128 -> 256, 3x3, 1,1, 1,1)
    (2): cudnn.ReLU
    (3): nn.StyleLoss
    (4): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1)
    (5): cudnn.ReLU
    (6): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1)
    (7): cudnn.ReLU
    (8): cudnn.SpatialMaxPooling(2x2, 2,2)
    (9): cudnn.SpatialConvolution(256 -> 512, 3x3, 1,1, 1,1)
    (10): cudnn.ReLU
    (11): nn.StyleLoss
    (12): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
    (13): cudnn.ReLU
    (14): nn.ContentLoss
    (15): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
    (16): cudnn.ReLU
    (17): cudnn.SpatialMaxPooling(2x2, 2,2)
    (18): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
    (19): cudnn.ReLU
    (20): nn.StyleLoss
  }
}
Capturing style target 1
Capturing style target 2
Capturing style target 3
Capturing style target 4
Capturing style target 5
Capturing style target 6
Capturing style target 7
Capturing style target 8
Running optimization with ADAM
Iteration 50 / 200
  Content 1 loss: 1840350.585938
  Style 1 loss: 2566.359043
  Style 2 loss: 3547471.069336
  Style 3 loss: 5368391.235352
  Style 4 loss: 355980.445862
  Style 5 loss: 13842.927933
  Total loss: 11128602.623463
Iteration 100 / 200
  Content 1 loss: 1840350.585938
  Style 1 loss: 2566.359043
  Style 2 loss: 3547471.069336
  Style 3 loss: 5368391.235352
  Style 4 loss: 355980.445862
  Style 5 loss: 13842.927933
  Total loss: 11128602.623463
Iteration 150 / 200
  Content 1 loss: 1840350.585938
  Style 1 loss: 2566.359043
  Style 2 loss: 3547471.069336
  Style 3 loss: 5368391.235352
  Style 4 loss: 355980.445862
  Style 5 loss: 13842.927933
  Total loss: 11128602.623463
Iteration 200 / 200
  Content 1 loss: 1840350.585938
  Style 1 loss: 2566.359043
  Style 2 loss: 3547471.069336
  Style 3 loss: 5368391.235352
  Style 4 loss: 355980.445862
  Style 5 loss: 13842.927933
  Total loss: 11128602.623463
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 538683157
Successfully loaded models/VGG16_SOD_finetune.caffemodel
conv1_1: 64 3 3 3
conv1_2: 64 64 3 3
conv2_1: 128 64 3 3
conv2_2: 128 128 3 3
conv3_1: 256 128 3 3
conv3_2: 256 256 3 3
conv3_3: 256 256 3 3
conv4_1: 512 256 3 3
conv4_2: 512 512 3 3
conv4_3: 512 512 3 3
conv5_1: 512 512 3 3
conv5_2: 512 512 3 3
conv5_3: 512 512 3 3
fc6: 1 1 25088 4096
fc7: 1 1 4096 4096
fc8-SOD100: 1 1 4096 100
Setting up style layer          2       :       relu1_1
Setting up style layer          7       :       relu2_1
Setting up style layer          12      :       relu3_1
Setting up style layer          19      :       relu4_1
Setting up content layer        21      :       relu4_2
Setting up style layer          26      :       relu5_1
Capturing content targets
nn.Sequential {
  [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> output]
  (1): nn.GPU(1) @ nn.Sequential {
    [input -> (1) -> (2) -> output]
    (1): cudnn.SpatialConvolution(3 -> 64, 3x3, 1,1, 1,1)
    (2): cudnn.ReLU
  }
  (2): nn.GPU(2) @ nn.Sequential {
    [input -> (1) -> output]
    (1): nn.StyleLoss
  }
  (3): nn.GPU(3) @ nn.Sequential {
    [input -> (1) -> output]
    (1): cudnn.SpatialConvolution(64 -> 64, 3x3, 1,1, 1,1)
  }
  (4): nn.GPU(4) @ nn.Sequential {
    [input -> (1) -> (2) -> output]
    (1): cudnn.ReLU
    (2): cudnn.SpatialMaxPooling(2x2, 2,2)
  }
  (5): nn.GPU(5) @ nn.Sequential {
    [input -> (1) -> (2) -> output]
    (1): cudnn.SpatialConvolution(64 -> 128, 3x3, 1,1, 1,1)
    (2): cudnn.ReLU
  }
  (6): nn.GPU(6) @ nn.Sequential {
    [input -> (1) -> (2) -> (3) -> output]
    (1): nn.StyleLoss
    (2): cudnn.SpatialConvolution(128 -> 128, 3x3, 1,1, 1,1)
    (3): cudnn.ReLU
  }
  (7): nn.GPU(7) @ nn.Sequential {
    [input -> (1) -> output]
    (1): cudnn.SpatialMaxPooling(2x2, 2,2)
  }
  (8): nn.GPU(8) @ nn.Sequential {
    [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> output]
    (1): cudnn.SpatialConvolution(128 -> 256, 3x3, 1,1, 1,1)
    (2): cudnn.ReLU
    (3): nn.StyleLoss
    (4): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1)
    (5): cudnn.ReLU
    (6): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1)
    (7): cudnn.ReLU
    (8): cudnn.SpatialMaxPooling(2x2, 2,2)
    (9): cudnn.SpatialConvolution(256 -> 512, 3x3, 1,1, 1,1)
    (10): cudnn.ReLU
    (11): nn.StyleLoss
    (12): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
    (13): cudnn.ReLU
    (14): nn.ContentLoss
    (15): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
    (16): cudnn.ReLU
    (17): cudnn.SpatialMaxPooling(2x2, 2,2)
    (18): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
    (19): cudnn.ReLU
    (20): nn.StyleLoss
  }
}
Capturing style target 1
Capturing style target 2
Capturing style target 3
Capturing style target 4
Capturing style target 5
Capturing style target 6
Capturing style target 7
Capturing style target 8
Running optimization with ADAM
Iteration 50 / 200
  Content 1 loss: 1613944.433594
  Style 1 loss: 3785.628319
  Style 2 loss: 5391063.720703
  Style 3 loss: 8514136.230469
  Style 4 loss: 540189.697266
  Style 5 loss: 18604.844570
  Total loss: 16081724.554920
Iteration 100 / 200
  Content 1 loss: 1613944.433594
  Style 1 loss: 3785.628319
  Style 2 loss: 5391063.720703
  Style 3 loss: 8514136.230469
  Style 4 loss: 540189.697266
  Style 5 loss: 18604.844570
  Total loss: 16081724.554920
Iteration 150 / 200
  Content 1 loss: 1613944.433594
  Style 1 loss: 3785.628319
  Style 2 loss: 5391063.720703
  Style 3 loss: 8514136.230469
  Style 4 loss: 540189.697266
  Style 5 loss: 18604.844570
  Total loss: 16081724.554920
Iteration 200 / 200
  Content 1 loss: 1613944.433594
  Style 1 loss: 3785.628319
  Style 2 loss: 5391063.720703
  Style 3 loss: 8514136.230469
  Style 4 loss: 540189.697266
  Style 5 loss: 18604.844570
  Total loss: 16081724.554920

What is happening here, and is it possible to fix this?

Here's the nvidia-smi output:

ubuntu@ip-Address:~$ nvidia-smi

Fri Oct 20 01:58:01 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48                 Driver Version: 367.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 0000:00:17.0     Off |                    0 |
| N/A   62C    P0    63W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 0000:00:18.0     Off |                    0 |
| N/A   47C    P0    71W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           Off  | 0000:00:19.0     Off |                    0 |
| N/A   67C    P0    61W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           Off  | 0000:00:1A.0     Off |                    0 |
| N/A   52C    P0    72W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla K80           Off  | 0000:00:1B.0     Off |                    0 |
| N/A   65C    P0    66W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla K80           Off  | 0000:00:1C.0     Off |                    0 |
| N/A   48C    P0    71W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla K80           Off  | 0000:00:1D.0     Off |                    0 |
| N/A   65C    P0    66W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla K80           Off  | 0000:00:1E.0     Off |                    0 |
| N/A   48C    P0    74W / 149W |      0MiB / 11439MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
ubuntu@ip-Address:~$

Edit:

This also happened with a second content/style image combo at the same image size values.

ProGamerGov commented 6 years ago

Using variations (only changing the -image_size value) of these two commands, I have noticed that VGG models with their FC layers removed, use less memory:

th neural_style.lua -backend cudnn -model_file models/VGG_ILSVRC_16_layers.caffemodel -proto_file models/VGG_ILSVRC_16_layers_deploy.prototxt

th neural_style.lua -backend cudnn -model_file models/vgg16.caffemodel -proto_file models/vgg16.prototxt

-image_size 512:

With FC Layers: 1684MiB
No FC Layers: 1423MiB
Difference: 261MiB

-image_size 1024:

With FC Layers: 4687MiB
No FC Layers: 4512MiB
Difference: 175MiB

-image_size 1536:

With FC Layers: 9937MiB
No FC Layers: 9702MiB
Difference: 235MiB

The VGG models with their FC Layers removed, come from here:

https://style-transfer.s3-us-west-2.amazonaws.com/vgg16.caffemodel

https://style-transfer.s3-us-west-2.amazonaws.com/vgg19.caffemodel

The prototxt files that I used with these models are from: https://github.com/crowsonkb/style_transfer

The VGG-16 model without the FC Layers is 56.1MB in size, while the VGG-16 model with it's FC Layers is 528MB in size.

Testing the idea idea with the VGG-16 SOD Fine-tune model:

The full model is 514MB in size. With the FC Layers stripped from the model, it's only 56.1MB in size. With both the FC Layers and all the Relu/Conv layers down to relu5_1 stripped from the model, it is 38.1MB in size. Stripping all the layers down to relu4_2, results in a model size of 20.1MB.

VGG layers are hierarchical, so in theory removing layers above the ones that Neural-Style is using, shouldn't negatively effect things. It also means that you can only strip off layers down to the high one that you are using.

Control Tests:

Stripped Model Tests:

The GPU usage for the above examples with an -image_size of 1024 is shown below:

Model	Model Size (MB)	__Total Usage__	LuaJIT Usage
Control	514MB	`3922MiB / 11439MiB`	`3911MiB`
Control Without Relu5_1	514MB	`3897MiB / 11439MiB`	`3886MiB`
Shaved Off FC Layers	56.1MB	`3646MiB / 11439MiB`	`3635MiB`
Shaved to relu5_1	38.1MB	`3610MiB / 11439MiB`	`3599MiB`
Shaved to relu4_2	20.1MB	`3516MiB / 11439MiB`	`3505MiB`

Comparing total usage:

Compared to the full model, shaving off the FC Layers saves 276MiB of GPU memory.
Compared to the full model, shaving off all the layers down to relu5_1 saves 312MiB of GPU memory.
Compared to the full model test without relu5_1, shaving off all the layers down to relu5_1 saves 381MiB of GPU memory.
Simply omitting the relu5_1 style layer only saves 25MiB of GPU memory.

So it seems that shaving off layers from a VGG model results in less GPU memory used, than simply just changing the -style_layers or -content_layers values.

The impact on style transfer quality in relation to stripping/removing layers, seems to resemble messing with the layer activation strengths like in my experiments here, or just simply changing the -seed value. But if the layers above the ones that you are using affect the style transfer outputs, then removing them could negatively impact quality. Farther testing would probably allow us to see if this is the case or not.

To "shave/strip" the model, I ran this command in caffe:

./build/tools/caffe train -solver models/vgg16_finetune/solver.prototxt -weights models/vgg16_finetune/VGG16_SOD_finetune.caffemodel -gpu 0

In the solver.prototxt, I made the the learning rate was set to zero, and only one iteration was used before saving the model:

base_lr: 0.000000
max_iter: 1

I then also simply deleted the lines for the layers I wanted to remove in the train_val.protoxt file, as per this suggestion: https://github.com/BVLC/caffe/issues/186#issuecomment-37141696

This way of stripping layers should be possible for NIN models as well, but I have no idea how much it would improve the performance of the model.

ProGamerGov commented 6 years ago

NIN-Imagenet Model Tests (nin_imagenet_conv.caffemodel):

The layer combinations used for both NIN tests, were: -content_layers relu1,relu2,relu3,relu5,relu6,relu7 -style_layers relu1,relu2,relu3,relu5,relu6,relu7

The GPU usage for the above examples with an -image_size of 1024 is shown below:

Model	Model Size (MB)	__Total Usage__	LuaJIT Usage
Control	28.9MB	`1993MiB / 11439MiB`	`1982MiB`
Shaved to relu7	6.42MB	`1974MiB / 11439MiB`	`1963MiB`

While the NIN model lost 22.48MiB (Just under 78%), only 19MiB of GPU memory was saved.
Compared to the VGG models, the output didn't change as drastically when layers were removed.

jcjohnson / neural-style

Loss values stay the same for every iteration, with extremely large image sizes #428