varungohil / Generalizing-Lottery-Tickets

This repository contains code to replicate the experiments given in NeurIPS 2019 paper "One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers"
https://rescience.github.io/bibliography/Gohil_2020.html
MIT License
51 stars 9 forks source link

Model compression of Generalizing Lottery Tickets #3

Closed authwork closed 4 years ago

authwork commented 4 years ago

Hello, I have a question: Lottery Tickets (a pruning method) should reduce the number of parameters. However, I do not see any computation reduction in current implementation. The size of model checkpoint after each pruning iteration keeps the same, and the model only use 0 to mask these weights.

varungohil commented 4 years ago

Hi @authwork ,

The current implementation prunes the network by freezing the parameters to zero, in effect reducing the number of varying (non-frozen) parameters. This is how most academic papers implement pruning.

To observe computation reduction, you can use hardware that exploits the sparsity of network to improve performance, or you can create a neural architecture from scratch, which is a replica of the pruned network (with 0 weights removed). I believe that the latter will be a non-trivial task.

The model size remains the same since we save the zero weights as well during checkpointing.

authwork commented 4 years ago

@varungohil Many thanks for your answer.

Then, I have one more question in the following code:

for name, params in model.named_parameters():
        if "weight" in name:
            weight_copy = params.data.abs().clone()
            mask = weight_copy.gt(threshold).float()
            zeros += mask.numel() - mask.nonzero().size(0)
            total += mask.numel()
            masks.append(mask)
            if random != 'false':
                masks = permute_masks(masks)

Why it is model.named_parameters(), instead of the trainable parameters? I mean, the non-trainable parameters can be regarded as the frozen/pruned parameters.

varungohil commented 4 years ago

Hi @authwork,

This is a programming choice to simplify code, primarily because PyTorch offers this named_parameters() method.

For iterating only over trainable parameters, we would need to store them which would increase the memory required by the program. Further, this would add to code complexity. To keep things simple, we iterate over all parameters.