Closed dawuchen closed 5 years ago
If the input is zero, the output of BN could be calculated in advance and saved, see the corresponding code. Then it would have little effect on the inference process.
In the meanwhile, to conduct elementwise summation with shortcut and conv_output (see the picture below), the indices of the remaining channel are saved.
In this paper, we focus on pruning conv layers. It works both for the network with and without BN layer. Some previous methods focus on scaling factor of BN layer, which is not suitable for structure without BN layer.
In your code ,you only zeroize the parameter of conv layer. But there are also parameters in BN(scaling and bias, especially the latter) layer, the output of BN will still be non-zero although its input is zero. Whether it would be better to zeroize the param in BN?