Closed authwork closed 4 years ago
Hi @authwork ,
The current implementation prunes the network by freezing the parameters to zero, in effect reducing the number of varying (non-frozen) parameters. This is how most academic papers implement pruning.
To observe computation reduction, you can use hardware that exploits the sparsity of network to improve performance, or you can create a neural architecture from scratch, which is a replica of the pruned network (with 0 weights removed). I believe that the latter will be a non-trivial task.
The model size remains the same since we save the zero weights as well during checkpointing.
@varungohil Many thanks for your answer.
Then, I have one more question in the following code:
for name, params in model.named_parameters():
if "weight" in name:
weight_copy = params.data.abs().clone()
mask = weight_copy.gt(threshold).float()
zeros += mask.numel() - mask.nonzero().size(0)
total += mask.numel()
masks.append(mask)
if random != 'false':
masks = permute_masks(masks)
Why it is model.named_parameters()
, instead of the trainable parameters?
I mean, the non-trainable parameters can be regarded as the frozen/pruned parameters.
Hi @authwork,
This is a programming choice to simplify code, primarily because PyTorch offers this named_parameters() method.
For iterating only over trainable parameters, we would need to store them which would increase the memory required by the program. Further, this would add to code complexity. To keep things simple, we iterate over all parameters.
Hello, I have a question: Lottery Tickets (a pruning method) should reduce the number of parameters. However, I do not see any computation reduction in current implementation. The size of model checkpoint after each pruning iteration keeps the same, and the model only use
0
to mask these weights.