Wronskia commented 7 years ago

Hello,

I am trying to fine-tune a model with sparsification. Actually, it corresponds to the alexnet architecture trained on imagenet data given here : https://github.com/cvjena/cnn-models . I am trying to finetune it with sparsification on a subset of imagenet data. My first step was to first test the accuracy of the model (Alexnet + caffemodel) provided on my subset of images using caffe only and I get results that are very close to what they obtained. My second step was to use caffe-jacinto to do the sparsification. Before doing that I tested the model the exact same way as in caffe with no sparsification using caffe-jacinto. Here is what I get :

W0811 14:14:31.177296 4905 net.cpp:811] Incompatible number of blobs for layer data/bn W0811 14:14:31.177886 4905 net.cpp:819] Copying from data/bn to data/bn target blob 0 W0811 14:14:31.178071 4905 net.cpp:832] Shape mismatch, param: 0 layer: data/bn source: 3 (3) target: 1 3 1 1 (3). W0811 14:14:31.178169 4905 net.cpp:819] Copying from data/bn to data/bn target blob 1 W0811 14:14:31.178247 4905 net.cpp:832] Shape mismatch, param: 1 layer: data/bn source: 3 (3) target: 1 3 1 1 (3). W0811 14:14:31.178308 4905 net.cpp:819] Copying from data/bn to data/bn target blob 2 W0811 14:14:31.178381 4905 net.cpp:825] Cannot copy param 2 weights from layer 'data/bn'; shape mismatch. Source param shape is 1 (1); target param shape is 1 3 1 1 (3). To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer. W0811 14:14:31.178517 4905 net.cpp:811] Incompatible number of blobs for layer conv1/bn W0811 14:14:31.178555 4905 net.cpp:819] Copying from conv1/bn to conv1/bn target blob 0 W0811 14:14:31.178652 4905 net.cpp:832] Shape mismatch, param: 0 layer: conv1/bn source: 96 (96) target: 1 96 1 1 (96). W0811 14:14:31.178719 4905 net.cpp:819] Copying from conv1/bn to conv1/bn target blob 1 W0811 14:14:31.178793 4905 net.cpp:832] Shape mismatch, param: 1 layer: conv1/bn source: 96 (96) target: 1 96 1 1 (96). W0811 14:14:31.178854 4905 net.cpp:819] Copying from conv1/bn to conv1/bn target blob 2 W0811 14:14:31.178926 4905 net.cpp:825] Cannot copy param 2 weights from layer 'conv1/bn'; shape mismatch. Source param shape is 1 (1); target param shape is 1 96 1 1 (96). To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer.

and so on for every batch normalization layer. however the testing moves forward to provide these poor results : I0811 14:14:33.100739 4905 caffe.cpp:312] Batch 199, accuracy = 0 I0811 14:14:33.100769 4905 caffe.cpp:312] Batch 199, loss = 87 I0811 14:14:33.100780 4905 caffe.cpp:312] Batch 199, top5 = 0 I0811 14:14:33.100786 4905 caffe.cpp:317] Loss: 86.826 I0811 14:14:33.100814 4905 caffe.cpp:329] accuracy = 0.002 I0811 14:14:33.100836 4905 caffe.cpp:329] loss = 86.826 (* 1 = 86.826 loss) I0811 14:14:33.100845 4905 caffe.cpp:329] top5 = 0.005

so it must be that the weights are not load correctly, my guess would be that the reshaping part is not set up correctly in the function void Net::CopyTrainedLayersFrom(const NetParameter& param) in net.cpp

PS: your full CIFAR example is perfectly working for me

Thank you, Best,

Yassine

mathmanu commented 7 years ago

caffe-jacinto is derived from NVIDIA/caffe.

The branch caffe-0.15 has a known issue in BatchNormalization layers: The batch normalization is incompatible to BVLC/caffe batch normalization. So cannot load a pre-trained model that has batch normalization and trained in BVLC/caffe.

NVIDIA/caffe has fixed this issue in the branch caffe-0.16. We are in the process of migrating to that branch. It may take couple of weeks.

However, if this is urgent for you, I can suggest couple of options:

you can use a model that doesn't have batch normalization. (eg. VGG16)
you can use a pre-trained model that is trained in NVIDIA/caffe of caffe-jacinto (branch caffe-0.15). For example, I have made jacintonet11 pre-trained model available.
you can start from scratch (without using pre-trained model)

Wronskia commented 7 years ago

Hi,

Thank you for your quick answer!

Should it work with CPU_ONLY support?

Best, Yassine

mathmanu commented 7 years ago

Part of the issue is related to CUDNN. If you use CUDNN, one part of the error will be solved. But it will still not be fully solved.

Wronskia commented 7 years ago

Actually, I want to test the effect of your sparsification method on alexnet and resnet architectures (with imagenet dataset) which are both using batch normalization. I tried to fix the problem in your source code but nothing worked and from the commits of caffe-0.16 branch related to batch normalization the fix doesn't seem to be straightforward.

Thanks again, Best, Yassine

mathmanu commented 7 years ago

We shall try to push the caffe-0.16 branch of caffe-jacinto soon. That will solve these incompatibilities.

Wronskia commented 7 years ago

Hello Manu,

I am coming back to you to know when do you think guys you are going to release the caffe-0.16 branch of caffe-jacinto ?

Thanks, Best, Yassine

mathmanu commented 7 years ago

Hi Yassine, It's almost ready. We shall target for 25th September 2017, Monday. Best regards, Manu.

Wronskia commented 7 years ago

Perfect.

Thank you.

mathmanu commented 7 years ago

@Wronskia branch caffe-0.16 is now avaialable for caffe-jacinto and caffe-jacinto-models Note that the default branch is still caffe-0.15. You have to manually switch to caffe-0.16 after clone or pull. git checkout caffe-0.16

See the example scripts located in caffe-jacinto-models/scripts/training folder. Also check the example trained models given in caffe-jacinto-models/trainied.

New Features - 2017 September:

Based on NVIDIA/caffe branch caffe-0.16 - so it fixes the Bath Normalization backward compatibility issues.
Additional features have been added for sparsity - for example improvements to reach the exact sparsity target specified. Also improvements to reduce accuracy drop during sparsification.
Estimate the accuracy with quantization - all you have to do is to set quantize: enable in your network prototxt definition.
Object detection using Single Shot Detector (SSD) has been integrated.

Wronskia commented 7 years ago

Hello Manu,

Thank you very much, I will give it a try now.

Best, Yassine

mathmanu commented 7 years ago

Keeping this issue open for some more time, as this is an interesting conversation and provides help to someone who is trying out the same.

mathmanu commented 7 years ago

Note that I have changed the default branch in github to caffe-0.16

mathmanu / caffe-jacinto

caffe jacinto sparsification #2

New Features - 2017 September: