Open Wronskia opened 7 years ago
caffe-jacinto is derived from NVIDIA/caffe.
The branch caffe-0.15 has a known issue in BatchNormalization layers: The batch normalization is incompatible to BVLC/caffe batch normalization. So cannot load a pre-trained model that has batch normalization and trained in BVLC/caffe.
NVIDIA/caffe has fixed this issue in the branch caffe-0.16. We are in the process of migrating to that branch. It may take couple of weeks.
However, if this is urgent for you, I can suggest couple of options:
Hi,
Thank you for your quick answer!
Should it work with CPU_ONLY support?
Best, Yassine
Part of the issue is related to CUDNN. If you use CUDNN, one part of the error will be solved. But it will still not be fully solved.
Actually, I want to test the effect of your sparsification method on alexnet and resnet architectures (with imagenet dataset) which are both using batch normalization. I tried to fix the problem in your source code but nothing worked and from the commits of caffe-0.16 branch related to batch normalization the fix doesn't seem to be straightforward.
Thanks again, Best, Yassine
We shall try to push the caffe-0.16 branch of caffe-jacinto soon. That will solve these incompatibilities.
Hello Manu,
I am coming back to you to know when do you think guys you are going to release the caffe-0.16 branch of caffe-jacinto ?
Thanks, Best, Yassine
Hi Yassine, It's almost ready. We shall target for 25th September 2017, Monday. Best regards, Manu.
Perfect.
Thank you.
@Wronskia branch caffe-0.16 is now avaialable for caffe-jacinto and caffe-jacinto-models Note that the default branch is still caffe-0.15. You have to manually switch to caffe-0.16 after clone or pull. git checkout caffe-0.16
See the example scripts located in caffe-jacinto-models/scripts/training folder. Also check the example trained models given in caffe-jacinto-models/trainied.
Hello Manu,
Thank you very much, I will give it a try now.
Best, Yassine
Keeping this issue open for some more time, as this is an interesting conversation and provides help to someone who is trying out the same.
Note that I have changed the default branch in github to caffe-0.16
Hello,
I am trying to fine-tune a model with sparsification. Actually, it corresponds to the alexnet architecture trained on imagenet data given here : https://github.com/cvjena/cnn-models . I am trying to finetune it with sparsification on a subset of imagenet data. My first step was to first test the accuracy of the model (Alexnet + caffemodel) provided on my subset of images using caffe only and I get results that are very close to what they obtained. My second step was to use caffe-jacinto to do the sparsification. Before doing that I tested the model the exact same way as in caffe with no sparsification using caffe-jacinto. Here is what I get :
W0811 14:14:31.177296 4905 net.cpp:811] Incompatible number of blobs for layer data/bn W0811 14:14:31.177886 4905 net.cpp:819] Copying from data/bn to data/bn target blob 0 W0811 14:14:31.178071 4905 net.cpp:832] Shape mismatch, param: 0 layer: data/bn source: 3 (3) target: 1 3 1 1 (3). W0811 14:14:31.178169 4905 net.cpp:819] Copying from data/bn to data/bn target blob 1 W0811 14:14:31.178247 4905 net.cpp:832] Shape mismatch, param: 1 layer: data/bn source: 3 (3) target: 1 3 1 1 (3). W0811 14:14:31.178308 4905 net.cpp:819] Copying from data/bn to data/bn target blob 2 W0811 14:14:31.178381 4905 net.cpp:825] Cannot copy param 2 weights from layer 'data/bn'; shape mismatch. Source param shape is 1 (1); target param shape is 1 3 1 1 (3). To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer. W0811 14:14:31.178517 4905 net.cpp:811] Incompatible number of blobs for layer conv1/bn W0811 14:14:31.178555 4905 net.cpp:819] Copying from conv1/bn to conv1/bn target blob 0 W0811 14:14:31.178652 4905 net.cpp:832] Shape mismatch, param: 0 layer: conv1/bn source: 96 (96) target: 1 96 1 1 (96). W0811 14:14:31.178719 4905 net.cpp:819] Copying from conv1/bn to conv1/bn target blob 1 W0811 14:14:31.178793 4905 net.cpp:832] Shape mismatch, param: 1 layer: conv1/bn source: 96 (96) target: 1 96 1 1 (96). W0811 14:14:31.178854 4905 net.cpp:819] Copying from conv1/bn to conv1/bn target blob 2 W0811 14:14:31.178926 4905 net.cpp:825] Cannot copy param 2 weights from layer 'conv1/bn'; shape mismatch. Source param shape is 1 (1); target param shape is 1 96 1 1 (96). To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer.
and so on for every batch normalization layer. however the testing moves forward to provide these poor results : I0811 14:14:33.100739 4905 caffe.cpp:312] Batch 199, accuracy = 0 I0811 14:14:33.100769 4905 caffe.cpp:312] Batch 199, loss = 87 I0811 14:14:33.100780 4905 caffe.cpp:312] Batch 199, top5 = 0 I0811 14:14:33.100786 4905 caffe.cpp:317] Loss: 86.826 I0811 14:14:33.100814 4905 caffe.cpp:329] accuracy = 0.002 I0811 14:14:33.100836 4905 caffe.cpp:329] loss = 86.826 (* 1 = 86.826 loss) I0811 14:14:33.100845 4905 caffe.cpp:329] top5 = 0.005
so it must be that the weights are not load correctly, my guess would be that the reshaping part is not set up correctly in the function void Net::CopyTrainedLayersFrom(const NetParameter& param) in net.cpp
PS: your full CIFAR example is perfectly working for me
Thank you, Best,
Yassine