Closed Shiweiliuiiiiiii closed 4 years ago
Yes. This is strange but it's actually what the snip paper does if you read it carefully. There's a long discussion about this in the openreview comments.
Thanks for your reply. It is true that Snip can learn architecturally important weights.
I noticed that you copy the model and reinitialize it to calculate the gradients in SNIP function by "nn.init.xaviernormal(layer.weight)". Thus, the initialization used for pruning is not the exact model used for training. Am I right?