locuslab / wanda

A simple and effective LLM pruning approach.
https://arxiv.org/abs/2306.11695
MIT License
655 stars 87 forks source link

Questions about sub-networks of LLMs #27

Closed JiwenJ closed 11 months ago

JiwenJ commented 11 months ago

Hello~. I am reading your paper and notice that you have mentioned lots of times that "exact and effective sparse sub-networks exist for LLMs". But I am a little confused and do not get it, your pruning method would behave differently depending on the input activations, which suggests it is a dynamic process all the time. So, could you please help me understand that? I appreciate that if you can help~

Eric-mingjie commented 11 months ago

Hi @JiwenJ ! Here the "exact" means that the remaining weights are kept the same as the original dense networks, where in SparseGPT, the weights are different from the original dense weights because a weight update process is applied on top of the pruning procedure.

The use of "exact" is to show that there are sparse sub-networks in LLMs, which couldn't be shown by either magnitude pruning or SparseGPT (involve weight updates).

The activations turn out to be a very robust metrics, which we are able to saturate the performance with very few calibration samples. However, I agree that how calibration sets impact the estimated activations should be an important research question.

JiwenJ commented 11 months ago

@Eric-mingjie Thanks for your reply. I appreciate your clarification!