In hidden networks, Ramanujan et al. develop a method to find masks via optimization (called edge-popup). The algorithm is extremely similar to movement pruning, where the masks are part of the computational graph and receive gradients for a negative gradient step. The main difference is that they freeze the weights and only train the scores (mask), such that they can find well-performing networks within randomly initialized models.
If I freeze the weights and apply movement pruning, is it the same as the above method? If not, what would be the difference?
From a theoretical standpoint, movement pruning talks about how the method will prune those weights that move towards zero as shown by the tendency of the gradients. In edge-popup, they never mention such behavior, but I assume it would be the same if both methods apply the same operations. Given the idea that they track tendency of weights towards zero, it sounds counterintuitive to freeze the weights since there will be no movement tendency anymore. However, that's what they do in edge-popup and it works surprisingly well. Any thoughts about this?
In hidden networks, Ramanujan et al. develop a method to find masks via optimization (called edge-popup). The algorithm is extremely similar to movement pruning, where the masks are part of the computational graph and receive gradients for a negative gradient step. The main difference is that they freeze the weights and only train the scores (mask), such that they can find well-performing networks within randomly initialized models.
If I freeze the weights and apply movement pruning, is it the same as the above method? If not, what would be the difference?
From a theoretical standpoint, movement pruning talks about how the method will prune those weights that move towards zero as shown by the tendency of the gradients. In edge-popup, they never mention such behavior, but I assume it would be the same if both methods apply the same operations. Given the idea that they track tendency of weights towards zero, it sounds counterintuitive to freeze the weights since there will be no movement tendency anymore. However, that's what they do in edge-popup and it works surprisingly well. Any thoughts about this?