Why the importance of weights evaluated by w^2/(H_ii)^2, instead of w^2/(H_ii) like in SparseGPT?

hahnyuan / PB-LLM

PB-LLM: Partially Binarized Large Language Models

MIT License

143 stars 10 forks source link

Why the importance of weights evaluated by w^2/(H_ii)^2, instead of w^2/(H_ii) like in SparseGPT? #6

Open Xingrun-Xing opened 4 months ago

hahnyuan commented 4 months ago

I follow the implementation in their code. https://github.com/IST-DASLab/sparsegpt/blob/c3bbf613a1822229767f4d8870b933049b8bef15/sparsegpt.py#L96C21-L96C78

Xingrun-Xing commented 4 months ago

I follow the implementation in their code. https://github.com/IST-DASLab/sparsegpt/blob/c3bbf613a1822229767f4d8870b933049b8bef15/sparsegpt.py#L96C21-L96C78

Thanks for your reply. But from OBS/OBC/SparseGPT, we know the delta_loss = w^2/(H_ii), instead of w^2/(H_ii)^2. Do you know why should we use w^2/(H_ii)^2 as the importance metric?