Closed JiwenJ closed 11 months ago
Hi @JiwenJ ! Here the "exact" means that the remaining weights are kept the same as the original dense networks, where in SparseGPT, the weights are different from the original dense weights because a weight update process is applied on top of the pruning procedure.
The use of "exact" is to show that there are sparse sub-networks in LLMs, which couldn't be shown by either magnitude pruning or SparseGPT (involve weight updates).
The activations turn out to be a very robust metrics, which we are able to saturate the performance with very few calibration samples. However, I agree that how calibration sets impact the estimated activations should be an important research question.
@Eric-mingjie Thanks for your reply. I appreciate your clarification!
Hello~. I am reading your paper and notice that you have mentioned lots of times that "exact and effective sparse sub-networks exist for LLMs". But I am a little confused and do not get it, your pruning method would behave differently depending on the input activations, which suggests it is a dynamic process all the time. So, could you please help me understand that? I appreciate that if you can help~