princeton-nlp / LLM-Shearing

[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
https://arxiv.org/abs/2310.06694
MIT License
533 stars 39 forks source link

about shearing params config #67

Open LoverLost opened 4 months ago

LoverLost commented 4 months ago

Hello, I would like to ask a question about parameter settings.I want to prune the llama2 model without changing the hidden_size, which means it is fixed at 4096. However i want to change the num_heads of attention,which means i want to prune the q/k/v/o from 4096 x 4096 to 4096 x 2048.Can i use the code to do this without change something? Also i noticed that in zs_block may have 'qk_head_dim_z', What does this thing do?

xiamengzhou commented 3 months ago

Hi @LoverLost sorry for the late reply!

qk_head_dim_z is not supported in the current code yet, and it was supposed to prune head dimensions instead of full heads. The current code supports pruning only the heads without pruning the hidden dimensions. You need to remove hidden from the prune_params. Let me know if you encounter any issues!