yiwenguo / Dynamic-Network-Surgery

Caffe implementation for dynamic network surgery.
Other
186 stars 70 forks source link

Cannot Compress Model #7

Closed kai-xie closed 7 years ago

kai-xie commented 7 years ago

DNS is a good method and thank you for sharing your code!

My question is: The compilation and installation was successful and it did not cost much effort ( Only your code was used and the original Caffe code was not added, so I assume your code could be used as a standalone package ). But when I tried to compress the Lenet5, as suggested int the README, I only changed the "ip1" layer's type to "CInnerProduct" and added the "cinner_product_param" part, the result did not converge, and the size of the ouput caffemodel is 3.2M, even larger thant the original size 1.7M.

So I was wondering if you have encountered this kind of problem before and what is possibly my mis-operation.

The following is the prototxt file, as is in the caffe examples, only the "CInnerProduct" part changed:

name: "LeNet" layer {  name: "mnist"  type: "Data"  top: "data"  top: "label"  include {   phase: TRAIN  }  transform_param {   scale: 0.00390625  }  data_param {   source: "examples/mnist/mnist_train_lmdb"   batch_size: 64   backend: LMDB  } } layer {  name: "mnist"  type: "Data"  top: "data"  top: "label"  include {   phase: TEST  }  transform_param {   scale: 0.00390625  }  data_param {   source: "examples/mnist/mnist_test_lmdb"   batch_size: 100   backend: LMDB  } } layer {  name: "conv1"  type: "Convolution"  bottom: "data"  top: "conv1"  param {   lr_mult: 1  }  param {   lr_mult: 2  }  convolution_param {   num_output: 20   kernel_size: 5   stride: 1   weight_filler {    type: "xavier"   }   bias_filler {    type: "constant"   }  } } layer {  name: "pool1"  type: "Pooling"  bottom: "conv1"  top: "pool1"  pooling_param {   pool: MAX   kernel_size: 2   stride: 2  } } layer {  name: "conv2"  type: "Convolution"  bottom: "pool1"  top: "conv2"  param {   lr_mult: 1  }  param {   lr_mult: 2  }  convolution_param {   num_output: 50   kernel_size: 5   stride: 1   weight_filler {    type: "xavier"   }   bias_filler {    type: "constant"   }  } } layer {  name: "pool2"  type: "Pooling"  bottom: "conv2"  top: "pool2"  pooling_param {   pool: MAX   kernel_size: 2   stride: 2  } } layer {  name: "ip1"  type: "CInnerProduct"  bottom: "pool2"  top: "ip1"  param {   lr_mult: 1  }  param {   lr_mult: 2  }  inner_product_param {   num_output: 500   weight_filler {    type: "xavier"   }   bias_filler {    type: "constant"   }  }  cinner_product_param {   gamma: 0.0001   power: 1   c_rate: 4   iter_stop: 14000   weight_mask_filler {    type: "constant"    value: 1   }   bias_mask_filler {    type: "constant"    value: 1   }  } } layer {  name: "relu1"  type: "ReLU"  bottom: "ip1"  top: "ip1" } layer {  name: "ip2"  type: "InnerProduct"  bottom: "ip1"  top: "ip2"  param {   lr_mult: 1  }  param {   lr_mult: 2  }  inner_product_param {   num_output: 10   weight_filler {    type: "xavier"   }   bias_filler {    type: "constant"   }  } } layer {  name: "accuracy"  type: "Accuracy"  bottom: "ip2"  bottom: "label"  top: "accuracy"  include {   phase: TEST  } } layer {  name: "loss"  type: "SoftmaxWithLoss"  bottom: "ip2"  bottom: "label"  top: "loss" }

The output from iteration 9000 to iteration 10000 is as following: (accuracy lingering around 0.1135 )

I0602 04:05:34.931988 15322 solver.cpp:314] Iteration 9000, Testing net (#0) I0602 04:05:35.897229 15322 solver.cpp:363] Test net output #0: accuracy = 0.1135 I0602 04:05:35.897274 15322 solver.cpp:363] Test net output #1: loss = 2.30104 ( 1 = 2.30104 loss) I0602 04:05:35.906638 15322 solver.cpp:226] Iteration 9000, loss = 2.30204 I0602 04:05:35.906673 15322 solver.cpp:242] Train net output #0: loss = 2.30204 ( 1 = 2.30204 loss) I0602 04:05:35.906682 15322 solver.cpp:521] Iteration 9000, lr = 0.00617924 I0602 04:05:37.375916 15322 solver.cpp:226] Iteration 9100, loss = 2.2923 I0602 04:05:37.376133 15322 solver.cpp:242] Train net output #0: loss = 2.2923 ( 1 = 2.2923 loss) I0602 04:05:37.376145 15322 solver.cpp:521] Iteration 9100, lr = 0.00615496 I0602 04:05:38.845537 15322 solver.cpp:226] Iteration 9200, loss = 2.30995 I0602 04:05:38.845561 15322 solver.cpp:242] Train net output #0: loss = 2.30995 ( 1 = 2.30995 loss) I0602 04:05:38.845568 15322 solver.cpp:521] Iteration 9200, lr = 0.0061309 I0602 04:05:40.314781 15322 solver.cpp:226] Iteration 9300, loss = 2.31165 I0602 04:05:40.314803 15322 solver.cpp:242] Train net output #0: loss = 2.31165 ( 1 = 2.31165 loss) I0602 04:05:40.314811 15322 solver.cpp:521] Iteration 9300, lr = 0.00610706 I0602 04:05:41.782209 15322 solver.cpp:226] Iteration 9400, loss = 2.29439 I0602 04:05:41.782232 15322 solver.cpp:242] Train net output #0: loss = 2.29439 ( 1 = 2.29439 loss) I0602 04:05:41.782239 15322 solver.cpp:521] Iteration 9400, lr = 0.00608343 I0602 04:05:43.237807 15322 solver.cpp:314] Iteration 9500, Testing net (#0) I0602 04:05:44.201413 15322 solver.cpp:363] Test net output #0: accuracy = 0.1135 I0602 04:05:44.201436 15322 solver.cpp:363] Test net output #1: loss = 2.30121 ( 1 = 2.30121 loss) I0602 04:05:44.210533 15322 solver.cpp:226] Iteration 9500, loss = 2.30612 I0602 04:05:44.210551 15322 solver.cpp:242] Train net output #0: loss = 2.30612 ( 1 = 2.30612 loss) I0602 04:05:44.210559 15322 solver.cpp:521] Iteration 9500, lr = 0.00606002 I0602 04:05:45.679636 15322 solver.cpp:226] Iteration 9600, loss = 2.30252 I0602 04:05:45.679658 15322 solver.cpp:242] Train net output #0: loss = 2.30252 ( 1 = 2.30252 loss) I0602 04:05:45.679666 15322 solver.cpp:521] Iteration 9600, lr = 0.00603682 I0602 04:05:47.147786 15322 solver.cpp:226] Iteration 9700, loss = 2.29213 I0602 04:05:47.147809 15322 solver.cpp:242] Train net output #0: loss = 2.29213 ( 1 = 2.29213 loss) I0602 04:05:47.147817 15322 solver.cpp:521] Iteration 9700, lr = 0.00601382 I0602 04:05:48.616607 15322 solver.cpp:226] Iteration 9800, loss = 2.29719 I0602 04:05:48.616629 15322 solver.cpp:242] Train net output #0: loss = 2.29719 ( 1 = 2.29719 loss) I0602 04:05:48.616637 15322 solver.cpp:521] Iteration 9800, lr = 0.00599102 I0602 04:05:50.084087 15322 solver.cpp:226] Iteration 9900, loss = 2.2912 I0602 04:05:50.084110 15322 solver.cpp:242] Train net output #0: loss = 2.2912 ( 1 = 2.2912 loss) I0602 04:05:50.084120 15322 solver.cpp:521] Iteration 9900, lr = 0.00596843 I0602 04:05:51.538485 15322 solver.cpp:399] Snapshotting to binary proto file examples/mnist/lenet_iter_10000.caffemodel I0602 04:05:51.553609 15322 solver.cpp:684] Snapshotting solver state to binary proto fileexamples/mnist/lenet_iter_10000.solverstate I0602 04:05:51.606297 15322 solver.cpp:295] Iteration 10000, loss = 2.29934 I0602 04:05:51.606360 15322 solver.cpp:314] Iteration 10000, Testing net (#0) I0602 04:05:52.568142 15322 solver.cpp:363] Test net output #0: accuracy = 0.1135 I0602 04:05:52.568188 15322 solver.cpp:363] Test net output #1: loss = 2.30109 (* 1 = 2.30109 loss) I0602 04:05:52.568197 15322 solver.cpp:300] Optimization Done. I0602 04:05:52.568205 15322 caffe.cpp:184] Optimization Done.

Thank you very much!

yiwenguo commented 7 years ago

Hi @kai-xie , what about using a smaller c_rate (e.g., 2 and 3) for the 'ip1' layer? In the cases when other layers are dense and only 'ip1' is to be compressed, 4 might be too large.

Yes, the obtained caffemodel after running this repo should be larger than the original dense model. This is because we store both the weight tensor Ws and the mask tensor Ts. Hence you should post-process the obtained model to further get sparse tensor W.*Ts and use sparse tensor storage formats for getting memory/storage savings.

kai-xie commented 7 years ago

I changed the c_rate to 2 and it worked. As for getting the compressed model, I think I'd better read the source code.

Thank you very much for your help!