Open Dongximing opened 8 months ago
Hi @Dongximing , generally, yes, if you want the magnitude of importance, it is generally reasonable to consider the absolute value of attribution scores and compare these values. But the sign information does provide information regarding the direction of the feature or token's contribution. Particularly, for feature ablation, if the attribution score is negative, this implies that the output score is higher when replaced with the baseline than the original. Hope this helps!
Hi everyone,
I have a question about LLM contribution, in the picture. this is Perturbation-based Attribution method. the basic idea is that replace the words in order for example
I love you
after tokenized10 20 30
then it use a "0"(you can change the "0" to something else) to replace the order to see the target_id log_softmax change. it is using baseline's log_softmax(target_id) - replace the words's log_softmax(target_id) '0 20 30' '10,0,30' '10.20,0'so in my opinion, need we use "absolute value" to evaluate the importance of tokens? For example, if contribution is [-3.5,3.6,1], the first important token is token_1(3.6) and the second is token_0(-3.5) and third is token_2(1)
also in the LLMGradientAttribution method
the final step is , https://github.com/pytorch/captum/blob/master/captum/attr/_core/llm_attr.py#L570 , it will sum the gradients on the last dim. I have a question how to eval the importance of tokens, Does bigger mean more important? for example, after sum , contribution is [-3.5,3.6,1], so ,is that means the first important token is token_1(3.6) and the second is token_2(1) and third is token_0(-3.5) ?
Thanks