[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baichuan, TinyLlama, etc.
In the readme, a --pruning_ratio 0.25 is used and it's mentioned it prunes 20% of parameters. Why is this? If I want to prune 10%, should I use --pruning_ratio 0.15?
In the readme, a --pruning_ratio 0.25 is used and it's mentioned it prunes 20% of parameters. Why is this? If I want to prune 10%, should I use --pruning_ratio 0.15?