Open paintception opened 4 years ago
Same question:)
@paintception @ffeng1996 Sorry for the delayed response. The following are the winning tickets of Lenet5 over mnist, fashionmnist, cifar10 and cifar100.
Thanks for pointing it out! For some reason, at specific weight percentages, the winning tickets are not being generated. Let me take a look and get back to you soon.
Thanks for replying. However, when I run the code for AlexNet, DenseNet-121, ResNet-18 and VGG-16. The pruning methods cannot find the winner tickets.
Thanks.
Actually for large models/datasets, you may need some tricks, such as learnig rate warmup or "late resetting". You can find the details in some papers [1, 2]. Hope this helps!
[1] Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In International Conference on Learning Representations, 2019. URL http://arxiv.org/abs/1803.03635.
[2] Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M Roy, and Michael Carbin. The lottery ticket hypothesis at scale. March 2019. URL http://arxiv.org/abs/1903.01611.
By the way, nice repo~
Thanks!
@ZhangXiao96 Thanks for the direction. Sorry that I am not able to reply promptly. I am busy with a few submissions. I will get back within a few days with a solution.
Hi, very much appreciate your work.
Few clarrifications, I've noticed in the your code particuarly in the "prune_by_percentile" func (line 269-292) in main.py that you don't seem to distinguish global pruning* for deeper networks e.g. VGG 19,Resnet 18 but instead prune each layer at the same rate (i.e. 10%) which are designed for fully connected layers applied in MNIST. Thank you for your time
*global pruning is specifically discussed in Chapter 4 of the original paper.
Hi, very much appreciate your work.
Few clarrifications, I've noticed in the your code particuarly in the "prune_by_percentile" func (line 269-292) in main.py that you don't seem to distinguish global pruning* for deeper networks e.g. VGG 19,Resnet 18 but instead prune each layer at the same rate (i.e. 10%) which are designed for fully connected layers applied in MNIST. Thank you for your time
*global pruning is specifically discussed in Chapter 4 of the original paper.
Hello, I think layerwised pruning is not especially designed for fully connected layers (see [1]). As you can see, we can keep the functional equivalence of a DNN by multiplying the weights of one layer by a number x and another layer by 1/x. However, this operation will change the results of global pruning, hence I believe layerwised pruning may be more reasonable. Just my opinions~
[1] Frankle et al., Stabilizing the Lottery Ticket Hypothesis
In my experience, global pruning works best on deeper convnets. Despite the fact that layers could theoretically rescale in inconvenient ways, that doesn't seem to happen in practice. Meanwhile, the layer sizes are so different in deep networks that layerwise pruning will delete small layers while leaving many extra parameters in big layers.
In my experience, global pruning works best on deeper convnets. Despite the fact that layers could theoretically rescale in inconvenient ways, that doesn't seem to happen in practice. Meanwhile, the layer sizes are so different in deep networks that layerwise pruning will delete small layers while leaving many extra parameters in big layers.
Thanks for your suggestions!
So this is what I got for CIFAR-10 using ResNet-18 with 256 batch size, 20 percent pruning for 20 pruning iterations over 60 epochs. My conclusion based on other work is that in order to stabilize the model compression you have to factor in the optimizer and architecture you are using, how to distribute the pruning (or more generally whatever compression method you are using) percentage per retraining step, how many retraining step you use, the capacity of the base model, the number of samples given for the dataset and the input-output dimensionality ratio.....basically everything :)
Hi, thanks for your nice repo!
Have you tested whether the CNNs are able to find winning tickets on Cifar10 and Cifar100?
I ran multiple experiments with most of the convolutional architectures you provide but I'm only able to find "winning tickets" when using an MLP on the Mnist dataset. When a CNN is used (no matter which one) the experiments of the original paper, e.g. on Cifar10, cannot be reproduced.
Any idea why this is happening?