hi, what 'convq_layer' means in net_pruner.py and net_skipper.py?

zhujiacheng commented 6 years ago

hi,
In my opinion there should be some python scripts that can remove all-ZEROs-weights filters(row sparsity) directly to accelerate GPU inferences without any CPU subroutines, so are the net_pruner.py and net_skipper.py used for that? Or Can you give me some advises? and I can not figure out what 'convq_layer' and 'convq_param_key' means in net_pruner.py and net_skipper.py, for example there obvioursly do not exit 'conv1q' key in src_net.params. Thanks a lot for your help! `

src_net = caffe.Net(srcproto,srcmodel, caffe.TEST)
print("src net:\n blobs {}\nparams {}\n".format(src_net.blobs.keys(), src_net.params.keys()))
src_net_parser = caffeparser.CaffeProtoParser(srcproto)
net_msg = src_net_parser.readProtoNetFile()

layer_idx = 0
loop_layers = net_msg.layer[:] #adding : implicitly makes a copy to avoid being modified in the loop
convxq_positions = []
convxq_m = []
convxq_add_layers = []
position_idx = 0

total_all_zero_counter = 0

# generate and save dst prototxt

for cur_layer in loop_layers:
    if 'Convolution'==cur_layer.type and re.match("^conv[0-9]+$",cur_layer.name):
        convq_layer = net_msg.layer._values[position_idx-1]
        convq_param_key = cur_layer.name+"q"
        param_key = cur_layer.name
        convx_ptr = net_msg.layer._values.pop(position_idx)
        convx_ptr.CopyFrom(cur_layer)
        convxq_ptr = net_msg.layer._values.pop(position_idx-1)
        convxq_ptr.CopyFrom(convq_layer)

        assert len(src_net.params[convq_param_key])==1
        weights_convxq = src_net.params[convq_param_key][0].data
        weights_convx = src_net.params[param_key][0].data
        assert weights_convx.shape[3]==1 and weights_convx.shape[2]==1

        orig_grp_num = weights_convxq.shape[0]/weights_convx.shape[1]
        cur_m = convq_layer.convolution_param.group
        orig_grp_num = cur_layer.convolution_param.group
        num_per_orig_grp = (cur_m/orig_grp_num)
        cur_sxs = weights_convx.shape[1]*orig_grp_num/cur_m

`

wenwei202 commented 6 years ago

Hello, those scripts are deprecated without usages for any purposes. The LOWERED_CCNMM conv_mode enables all-zero weights removal. Please all check issues for some implementation details. Specifically, the cpu mode is fully supported while gpu mode uses some cpu functions for temporary test.

zhujiacheng commented 6 years ago

@wenwei202 thanks for your help. but when I test the inference time with examples/cifar10_classifier.py on one nivdia 1070 gpu, the model's sparsity and the result are following.

cifar10_full_ssl_200000.caffemodel sparsity

I0313 15:25:18.214439 3057 base_conv_layer.cpp:17] layer conv1 has sparsity of 0.610833 I0313 15:25:18.215625 3057 base_conv_layer.cpp:61] ConvolutionParameter_ConvMode_LOWERED_CCNMM I0313 15:25:18.215688 3057 base_conv_layer.cpp:80] concatenating weight matrix I0313 15:25:18.215701 3057 base_conv_layer.cpp:88] conv1 left_cols=75 left_rows=14 I0313 15:25:18.215739 3057 base_conv_layer.cpp:91] squeezing weight matrix I0313 15:25:18.215749 3057 base_conv_layer.cpp:102] conv1 squeezing to 14x75 I0313 15:25:18.215775 3057 base_conv_layer.cpp:114] weight matrix squeezed I0313 15:25:18.215785 3057 base_conv_layer.cpp:180] weights lying in all-zero groups of conv1 are frozen I0313 15:25:18.216166 3057 base_conv_layer.cpp:17] layer conv2 has sparsity of 0.848477 I0313 15:25:18.226200 3057 base_conv_layer.cpp:61] ConvolutionParameter_ConvMode_LOWERED_CCNMM I0313 15:25:18.226290 3057 base_conv_layer.cpp:80] concatenating weight matrix I0313 15:25:18.226305 3057 base_conv_layer.cpp:88] conv2 left_cols=270 left_rows=20 I0313 15:25:18.226348 3057 base_conv_layer.cpp:91] squeezing weight matrix I0313 15:25:18.226358 3057 base_conv_layer.cpp:102] conv2 squeezing to 20x270 I0313 15:25:18.226404 3057 base_conv_layer.cpp:114] weight matrix squeezed I0313 15:25:18.226415 3057 base_conv_layer.cpp:180] weights lying in all-zero groups of conv2 are frozen I0313 15:25:18.227262 3057 base_conv_layer.cpp:17] layer conv3 has sparsity of 0.660352 I0313 15:25:18.249153 3057 base_conv_layer.cpp:61] ConvolutionParameter_ConvMode_LOWERED_CCNMM I0313 15:25:18.249279 3057 base_conv_layer.cpp:80] concatenating weight matrix I0313 15:25:18.249299 3057 base_conv_layer.cpp:88] conv3 left_cols=486 left_rows=62 I0313 15:25:18.249359 3057 base_conv_layer.cpp:91] squeezing weight matrix I0313 15:25:18.249370 3057 base_conv_layer.cpp:102] conv3 squeezing to 62x486 I0313 15:25:18.249470 3057 base_conv_layer.cpp:114] weight matrix squeezed I0313 15:25:18.249481 3057 base_conv_layer.cpp:180] weights lying in all-zero groups of conv3 are frozen I0313 15:25:18.249981 3057 inner_product_layer.cpp:12] layer ip1 has sparsity of 0.153613 I0313 15:25:18.254674 3057 inner_product_layer.cpp:20] weights lying in all-zero groups of ip1 are frozen I0313 15:25:18.254782 3057 net.cpp:895] Ignoring source layer loss

inference times (batch_size=32)

	cifar10_full.prototxt	cifar10_full_ccnmm.prototxt conv_mode: LOWERED_CCNMM
cifar10_full_baseline.caffemodel	5ms (Top 1): 81.52% (Top 5): 99.04%	31ms (Top 1): 81.52% (Top 5): 99.05%
cifar10_full_ssl_200000.caffemodel	5ms (Top 1): 80.37% (Top 5): 98.90%	31ms (Top 1): 80.37% (Top 5): 98.90%

Why?

so Why the inference time is much more when conv_mode: LOWERED_CCNMM, and I can not see the inference time cuts down when using cifar10_full_ssl_200000.caffemodel?

wenwei202 commented 6 years ago

To duplicate the results, please refer here on how I measured speed. I only counted the time of matrix-matrix multiplication and excluded all of others. For example, in cpu mode, the lowering process im2col consumes 80% time. I didn't want such kinds of inefficient implementations of those functionalities to deteriorate the results.

zhujiacheng commented 6 years ago

@wenwei202 thanks for your help. I get it now. then I will make some efforts to cut off the zeros filters and zeros channels directly in its weight caffemodel and prototxt , just according to row sparsity. Maybe make it when saving caffemodel in the end of train. Is that a good way to avoid that kinds of inefficient implementations?

wenwei202 / caffe