lottery-ticket / rewinding-iclr20-public

70 stars 10 forks source link

Questions about learning rate rewinding #1

Open aheatu opened 4 years ago

aheatu commented 4 years ago

Hi there. Thank you for your excellent paper.

I have some questions about Comparing Rewinding and Fine-tuning in Neural Network Pruning. I read interestingly Appendix F of the paper and #FLOP comparison (Reviewer 1) of openreview.

In this part, pruning reduces flops and speeds up. To my knowledge, pruning requires a structured pruning to reduce FLOPs. Is the pruning technique used in this section structured pruning?

As I read the paper, I understood that iterative pruning is unstructured pruning and one-shot pruning is structured pruning. Precisely, unstructured pruning was done iterative and one-shot. However, I understood that the structured pruning was only one-shot pruning. Did I understand correctly?

If what I understand is correct, the iterative pruning in Appendix F is unstructured pruning. I wonder how you reduced FLOPs with unstructured pruning. Please tell us if you have used a device that supports sparse coding.

If there is anything I misunderstood, please let me know. Thanks for reading my question.

alexrenda commented 4 years ago

Hi,

Glad you liked our paper!

The structured/unstructured comparison is orthogonal from the one-shot/iterative comparison (although we only run iterative for unstructured). We present both unstructured and structured one-shot pruning results (e.g., see Figure 1).

As for the FLOP reduction in Appendix F: this is a theoretical measure of efficiency and not necessarily reflective of wall-clock time. That is, it would be possible to implement the NN using only that many FLOPs (e.g. by unrolling the code for the entire NN and dropping instructions that correspond to pruned weights), though this does not correspond to wall-clock speedup on commodity CPUs. There are approaches to sparse NN acceleration that could translate this FLOP reduction into wall-clock speedup (e.g., https://arxiv.org/abs/2005.04091, https://arxiv.org/abs/1602.01528, https://arxiv.org/abs/1708.04485, https://www.cerebras.net, and more), though we don't show that the observed difference in FLOPs between the pruned networks translates to differences in wall-clock time on any of these systems.

aheatu commented 4 years ago

Thank you for your kind answer.

So, did Appendix F measure the theoretical efficiency after unstructured pruning based on algorithm 1 mentioned in main body of the paper?

If you measure theoretical efficiency after unstructured pruning, could you explain how to measure theoretical efficiency? If you are busy, I would appreciate it if you could recommend some materials such as a paper or blog posting.

alexrenda commented 4 years ago

No problem!

We count FLOPs by counting up all of the multiplications and additions in the net, similar to the approach here: https://github.com/Eric-mingjie/rethinking-network-pruning/blob/master/cifar/weight-level/count_flops.py