horseee / LLM-Pruner

[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baichuan, TinyLlama, etc.
https://arxiv.org/abs/2305.11627
Apache License 2.0
860 stars 100 forks source link

Force even pruning across layers #29

Open thedarkzeno opened 1 year ago

thedarkzeno commented 1 year ago

Is there a way to force the pruning to remove the same amount of parameters from all layers? This would make the resulting model compatible with hf implementation (use from_pretrained)

horseee commented 1 year ago

Hi.

There are two methods to achieve the pruning of an equal number of parameters across all layers:

  1. Continue with block-wise pruning: You can set the parameters block_mlp_layer_start/block_mlp_layer_end/block_attention_layer_start/block_attention_layer_end to 0/N/0/N, where N represents the layer number of the model.

  2. Alternatively, you can opt for channel-wise pruning by setting the flag to --channel_wise instead of --block_wise.

However, it's important to note that this approach may significantly impact the model's performance. Pruning parameters from the first or last layers can have a substantial influence on the model's behavior, as indicated by the experimental results in Figure 3 of our paper.