Open hiyijian opened 7 years ago
In our models, the residual branch is BN-RELU-CONV-BN-RELU-CONV-BN-RELU-CONV.
In the addition, all features from the identity mapping and the last CONV in residual branch are kept. So the main branch has the original widths of ResNets. The pruning only happens in layers inside residual branch.
Inside each residual branch:
In the first BN layer, if we detect very small scaling parameters, we mask corresponding channels out, before the first BN layer, by a channel selection layer (Actually this channel selection causes a time overhead, thus I don't recommend to do it in practice).
The last CONV output the same number of channels as the main branch (there's no BN to do selection).
For other intermediate layers, the pruning is the same as in plain network (e.g., VGG).
If your residual branch is different from ours, you may need to modify the pruning process. But the key point is that the main branch doesn't get slimmed, the pruning is only inside residual branch. How you prune in the residual branch depends on how you order your BN and CONV layers.
Thanks. Do you think the sparsity will be effected if BN layers on main branch are not penalty by L1 norm. If yes, how? Thanks
What I mean by "main branch" is the identity shortcut throughout the network, so there are no BN layers in main branch. Whenever there is an BN, we can do channel pruning or selection according to its scaling parameters. Thanks!
hi, @liuzhuang13 , can you release the code about DenseNet-slimming? Thank you
Hi @youngfly11, thanks for your interests. DenseNet's code is a little different than VGG's. Unfortunately I am busy with other things now, so I will probably release the code when I have time next month.
The way I implemented DenseNet slimming can save parameters and FLOPs, however, cannot bring speedup in the current Torch package. I implemented it using a channel selection layer, which leads to slower inference than a normal network, because it involves memory copy, not in-place selection.
If you just want the same speed as normal network, after training you can set low scaling factors and corresponding biases to 0, and don't do gradient update on them. It's equivalent as actually pruning the channels.
Thanks
In case you're still interested, we've released our Pytorch implementation here https://github.com/Eric-mingjie/network-slimming, which supports ResNet and DenseNet.
Thanks
Thanks for your wonderful work. But if the residual branch is CONV-RELU-BN-CONV-RELU-BN-CONV-RELU-BN. Then the channels of this residual branch is different from the main branch one. How should I handle this situation? Thank you.
Thanks for your wonderful work. But if the residual branch is CONV-RELU-BN-CONV-RELU-BN-CONV-RELU-BN. Then the channels of this residual branch is different from the main branch one. How should I handle this situation? Thank you.
hi,have you solved this problem?i also encounter this issue.
Dear @liuzhuang13, I guess we should prune some channel of subsequent conv layer' kernels after pruning current layer. Am I right? So I can not figure out how to slim residual block using your method. The two branches may have diffrient channels pruned, so we can only prune the intersection of both?
Almost the same situation in shortcut version. How do you handle this?
Thanks
hi,how do you handle with this situation?thx
Dear @liuzhuang13, I guess we should prune some channel of subsequent conv layer' kernels after pruning current layer. Am I right? So I can not figure out how to slim residual block using your method. The two branches may have diffrient channels pruned, so we can only prune the intersection of both?
Almost the same situation in shortcut version. How do you handle this?
Thanks