Open zhujiacheng opened 6 years ago
Hello, those scripts are deprecated without usages for any purposes. The LOWERED_CCNMM conv_mode
enables all-zero weights removal. Please all check issues for some implementation details. Specifically, the cpu mode is fully supported while gpu mode uses some cpu functions for temporary test.
@wenwei202 thanks for your help. but when I test the inference time with examples/cifar10_classifier.py on one nivdia 1070 gpu, the model's sparsity and the result are following.
I0313 15:25:18.214439 3057 base_conv_layer.cpp:17] layer conv1 has sparsity of 0.610833 I0313 15:25:18.215625 3057 base_conv_layer.cpp:61] ConvolutionParameter_ConvMode_LOWERED_CCNMM I0313 15:25:18.215688 3057 base_conv_layer.cpp:80] concatenating weight matrix I0313 15:25:18.215701 3057 base_conv_layer.cpp:88] conv1 left_cols=75 left_rows=14 I0313 15:25:18.215739 3057 base_conv_layer.cpp:91] squeezing weight matrix I0313 15:25:18.215749 3057 base_conv_layer.cpp:102] conv1 squeezing to 14x75 I0313 15:25:18.215775 3057 base_conv_layer.cpp:114] weight matrix squeezed I0313 15:25:18.215785 3057 base_conv_layer.cpp:180] weights lying in all-zero groups of conv1 are frozen I0313 15:25:18.216166 3057 base_conv_layer.cpp:17] layer conv2 has sparsity of 0.848477 I0313 15:25:18.226200 3057 base_conv_layer.cpp:61] ConvolutionParameter_ConvMode_LOWERED_CCNMM I0313 15:25:18.226290 3057 base_conv_layer.cpp:80] concatenating weight matrix I0313 15:25:18.226305 3057 base_conv_layer.cpp:88] conv2 left_cols=270 left_rows=20 I0313 15:25:18.226348 3057 base_conv_layer.cpp:91] squeezing weight matrix I0313 15:25:18.226358 3057 base_conv_layer.cpp:102] conv2 squeezing to 20x270 I0313 15:25:18.226404 3057 base_conv_layer.cpp:114] weight matrix squeezed I0313 15:25:18.226415 3057 base_conv_layer.cpp:180] weights lying in all-zero groups of conv2 are frozen I0313 15:25:18.227262 3057 base_conv_layer.cpp:17] layer conv3 has sparsity of 0.660352 I0313 15:25:18.249153 3057 base_conv_layer.cpp:61] ConvolutionParameter_ConvMode_LOWERED_CCNMM I0313 15:25:18.249279 3057 base_conv_layer.cpp:80] concatenating weight matrix I0313 15:25:18.249299 3057 base_conv_layer.cpp:88] conv3 left_cols=486 left_rows=62 I0313 15:25:18.249359 3057 base_conv_layer.cpp:91] squeezing weight matrix I0313 15:25:18.249370 3057 base_conv_layer.cpp:102] conv3 squeezing to 62x486 I0313 15:25:18.249470 3057 base_conv_layer.cpp:114] weight matrix squeezed I0313 15:25:18.249481 3057 base_conv_layer.cpp:180] weights lying in all-zero groups of conv3 are frozen I0313 15:25:18.249981 3057 inner_product_layer.cpp:12] layer ip1 has sparsity of 0.153613 I0313 15:25:18.254674 3057 inner_product_layer.cpp:20] weights lying in all-zero groups of ip1 are frozen I0313 15:25:18.254782 3057 net.cpp:895] Ignoring source layer loss
cifar10_full.prototxt | cifar10_full_ccnmm.prototxt conv_mode: LOWERED_CCNMM | |
---|---|---|
cifar10_full_baseline.caffemodel | 5ms (Top 1): 81.52% (Top 5): 99.04% | 31ms (Top 1): 81.52% (Top 5): 99.05% |
cifar10_full_ssl_200000.caffemodel | 5ms (Top 1): 80.37% (Top 5): 98.90% | 31ms (Top 1): 80.37% (Top 5): 98.90% |
so Why the inference time is much more when conv_mode: LOWERED_CCNMM, and I can not see the inference time cuts down when using cifar10_full_ssl_200000.caffemodel?
To duplicate the results, please refer here on how I measured speed. I only counted the time of matrix-matrix multiplication and excluded all of others. For example, in cpu mode, the lowering process im2col
consumes 80% time. I didn't want such kinds of inefficient implementations of those functionalities to deteriorate the results.
@wenwei202 thanks for your help. I get it now. then I will make some efforts to cut off the zeros filters and zeros channels directly in its weight caffemodel and prototxt , just according to row sparsity. Maybe make it when saving caffemodel in the end of train. Is that a good way to avoid that kinds of inefficient implementations?
hi,
In my opinion there should be some python scripts that can remove all-ZEROs-weights filters(row sparsity) directly to accelerate GPU inferences without any CPU subroutines, so are the net_pruner.py and net_skipper.py used for that? Or Can you give me some advises? and I can not figure out what 'convq_layer' and 'convq_param_key' means in net_pruner.py and net_skipper.py, for example there obvioursly do not exit 'conv1q' key in src_net.params. Thanks a lot for your help! `
`