pytorch / captum

Model interpretability and understanding for PyTorch
https://captum.ai
BSD 3-Clause "New" or "Revised" License
4.8k stars 485 forks source link

what is the meaning of contribution? #1266

Open Dongximing opened 5 months ago

Dongximing commented 5 months ago

Hi everyone,

I have a question about LLM contribution, in the picture. this is Perturbation-based Attribution method. the basic idea is that replace the words in order for example I love you after tokenized 10 20 30 then it use a "0"(you can change the "0" to something else) to replace the order to see the target_id log_softmax change. it is using baseline's log_softmax(target_id) - replace the words's log_softmax(target_id) '0 20 30' '10,0,30' '10.20,0'

Screen Shot 2024-03-28 at 9 20 15 AM

so in my opinion, need we use "absolute value" to evaluate the importance of tokens? For example, if contribution is [-3.5,3.6,1], the first important token is token_1(3.6) and the second is token_0(-3.5) and third is token_2(1)

also in the LLMGradientAttribution method

the final step is , https://github.com/pytorch/captum/blob/master/captum/attr/_core/llm_attr.py#L570 , it will sum the gradients on the last dim. I have a question how to eval the importance of tokens, Does bigger mean more important? for example, after sum , contribution is [-3.5,3.6,1], so ,is that means the first important token is token_1(3.6) and the second is token_2(1) and third is token_0(-3.5) ?

Thanks

vivekmig commented 5 months ago

Hi @Dongximing , generally, yes, if you want the magnitude of importance, it is generally reasonable to consider the absolute value of attribution scores and compare these values. But the sign information does provide information regarding the direction of the feature or token's contribution. Particularly, for feature ablation, if the attribution score is negative, this implies that the output score is higher when replaced with the baseline than the original. Hope this helps!