huggingface / nn_pruning

Prune a model while finetuning or training.
Apache License 2.0
393 stars 57 forks source link

Is edge-popup the same as movement pruning with frozen weights? #30

Open mariomeissner opened 2 years ago

mariomeissner commented 2 years ago

In hidden networks, Ramanujan et al. develop a method to find masks via optimization (called edge-popup). The algorithm is extremely similar to movement pruning, where the masks are part of the computational graph and receive gradients for a negative gradient step. The main difference is that they freeze the weights and only train the scores (mask), such that they can find well-performing networks within randomly initialized models.

If I freeze the weights and apply movement pruning, is it the same as the above method? If not, what would be the difference?

From a theoretical standpoint, movement pruning talks about how the method will prune those weights that move towards zero as shown by the tendency of the gradients. In edge-popup, they never mention such behavior, but I assume it would be the same if both methods apply the same operations. Given the idea that they track tendency of weights towards zero, it sounds counterintuitive to freeze the weights since there will be no movement tendency anymore. However, that's what they do in edge-popup and it works surprisingly well. Any thoughts about this?