Closed RachitBansal closed 2 years ago
Hi Rachit,
Thanks for your feedback. I see where you're coming from. I actually have code fragments of getting Captum's Integrated Gradients working for text generation, and it would indeed be reasonable to "outsource" saliency calculation to Captum. At least as an option. I'll keep this issue active and open for contributions. Let me dig up that code and post a gist to help orient the discussion.
Thank you for addressing this, @jalammar. This sounds great.
I had got the aforementioned methods working for machine translation using Captum, but I had to actually make some changes internally in the open-sourced model (XLM in this case). I wonder what might be the best way to share such a thing.
Also, I would be happy to help with this contribution. Could you elaborate a bit on what you meant by outsourcing the saliency calculation to Captum? Do you mean adding it as a dependency and plugging it at the places where the attributions need to be found? I was thinking more on the side of implementing those methods alongside the current methods in attribution.py itself.
I wonder what might be the best way to share such a thing. Probably a github Gist?
Could you elaborate a bit on what you meant by outsourcing the saliency calculation to Captum? Do you mean adding it as a dependency and plugging it at the places where the attributions need to be found?
Yes, exactly. This way it becomes Captum's concern to get these implementations correct and we don't duplicate the effort hile getting a large collection of methods supported and maintained.
LRP does not work since it takes no parameter forward_func
. Same for all the rest except ig, saliency, grad x input.
@jalammar Sry to continue on this closed issue. As @Victordmz said, the supports to other models like LRP and DeepLIME are not working. Specifically, even changing forward_func
to model
(as described in Capsum) raises an exception. New code in attribution.py
:
ig = attr_method_class(model=model)
attributions = ig.attribute(inputs, target=prediction_id
This raises an exception in the local model:
saliency4alce/transformers/src/transformers/models/llama/modeling_llama.py", line 629, in forward
batch_size, seq_length = input_ids.shape
ValueError: too many values to unpack (expected 2)
I'm trying to figure out a solution and will contribute by opening a PR. If anyone has already solved this problem, I would appreciate it a lot if you can ping me here!
Hi, Currently, the project seems to be relying on grad-norm and grad-x-input to obtain the attributions. However, there are other arguably better (as discussed in recent work) methods to obtain saliency maps. Integrating them in this project would also provide a good way to compare them on the same input examples.
Some of these methods from the top of my head are- integrated gradients, gradient shapley, and LIME. Perhaps support for visualizing the attention map from the model being interpreted itself could also be added. Methods based on feature ablation are also possible but they might need more work to integrate.
There is support for these aforementioned methods on Captum, but it takes effort to get them working for NLP tasks, especially those based on language modeling. Thus, I feel this would be a useful addition here.