liuzhuang13 / slimming

Learning Efficient Convolutional Networks through Network Slimming, In ICCV 2017.
MIT License
553 stars 72 forks source link

Slimming Resnet #2

Open hiyijian opened 6 years ago

hiyijian commented 6 years ago

Dear @liuzhuang13, I guess we should prune some channel of subsequent conv layer' kernels after pruning current layer. Am I right? So I can not figure out how to slim residual block using your method. image The two branches may have diffrient channels pruned, so we can only prune the intersection of both?

image Almost the same situation in shortcut version. How do you handle this?

Thanks

liuzhuang13 commented 6 years ago

In our models, the residual branch is BN-RELU-CONV-BN-RELU-CONV-BN-RELU-CONV.

In the addition, all features from the identity mapping and the last CONV in residual branch are kept. So the main branch has the original widths of ResNets. The pruning only happens in layers inside residual branch.

Inside each residual branch:

  1. In the first BN layer, if we detect very small scaling parameters, we mask corresponding channels out, before the first BN layer, by a channel selection layer (Actually this channel selection causes a time overhead, thus I don't recommend to do it in practice).

  2. The last CONV output the same number of channels as the main branch (there's no BN to do selection).

  3. For other intermediate layers, the pruning is the same as in plain network (e.g., VGG).

If your residual branch is different from ours, you may need to modify the pruning process. But the key point is that the main branch doesn't get slimmed, the pruning is only inside residual branch. How you prune in the residual branch depends on how you order your BN and CONV layers.

hiyijian commented 6 years ago

Thanks. Do you think the sparsity will be effected if BN layers on main branch are not penalty by L1 norm. If yes, how? Thanks

liuzhuang13 commented 6 years ago

What I mean by "main branch" is the identity shortcut throughout the network, so there are no BN layers in main branch. Whenever there is an BN, we can do channel pruning or selection according to its scaling parameters. Thanks!

youngfly11 commented 6 years ago

hi, @liuzhuang13 , can you release the code about DenseNet-slimming? Thank you

liuzhuang13 commented 6 years ago

Hi @youngfly11, thanks for your interests. DenseNet's code is a little different than VGG's. Unfortunately I am busy with other things now, so I will probably release the code when I have time next month.

The way I implemented DenseNet slimming can save parameters and FLOPs, however, cannot bring speedup in the current Torch package. I implemented it using a channel selection layer, which leads to slower inference than a normal network, because it involves memory copy, not in-place selection.

If you just want the same speed as normal network, after training you can set low scaling factors and corresponding biases to 0, and don't do gradient update on them. It's equivalent as actually pruning the channels.

Thanks

liuzhuang13 commented 6 years ago

In case you're still interested, we've released our Pytorch implementation here https://github.com/Eric-mingjie/network-slimming, which supports ResNet and DenseNet.

hiyijian commented 6 years ago

Thanks

yyjabidintg commented 5 years ago

Thanks for your wonderful work. But if the residual branch is CONV-RELU-BN-CONV-RELU-BN-CONV-RELU-BN. Then the channels of this residual branch is different from the main branch one. How should I handle this situation? Thank you.

toyal commented 4 years ago

Thanks for your wonderful work. But if the residual branch is CONV-RELU-BN-CONV-RELU-BN-CONV-RELU-BN. Then the channels of this residual branch is different from the main branch one. How should I handle this situation? Thank you.

hi,have you solved this problem?i also encounter this issue.

toyal commented 4 years ago

Dear @liuzhuang13, I guess we should prune some channel of subsequent conv layer' kernels after pruning current layer. Am I right? So I can not figure out how to slim residual block using your method. image The two branches may have diffrient channels pruned, so we can only prune the intersection of both?

image Almost the same situation in shortcut version. How do you handle this?

Thanks

hi,how do you handle with this situation?thx