Closed RachitBansal closed 4 years ago
Hi @RachitBansal , from quick look at the stack trace, it seems like the backward pass isn't returning appropriate results and just returning None for some reason.
To debug this further, can you try using Saliency, which just takes gradients at the input, and see if a similar issue is obtained? It should take the same arguments to attribute as IntegratedGradients except baselines, n_steps, and return_convergence_delta.
If that works appropriately, the issue might be related to taking gradients with respect to the expanded inputs in Integrated Gradients.
If you can share your code to reproduce this issue in a Colab notebook or other form, we can try to help debug further.
Hi @vivekmig, thank you for your response,
If you can share your code to reproduce this issue in a Colab notebook or other form, we can try to help debug further.
Here is the Colab notebook where I have uploaded my code, you should be able to reproduce the issue if you run it along with this zip folder (after unzipping), which contains all the external python files and weights.
To debug this further, can you try using Saliency, which just takes gradients at the input, and see if a similar issue is obtained? It should take the same arguments to attribute as IntegratedGradients except baselines, n_steps, and return_convergence_delta.
I will definitely try this and get back to you, while you see into that code.
I have been trying to get this working for a long time but to no avail. Would really appreciate your help.
Hi @RachitBansal , I'm not able to access the notebook or zip, can you check the permissions to share them?
After running for Saliency:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-38-7a10faecc7e7> in <module>
----> 1 interpret_sentence_saliency(input_text, ground_truth)
<ipython-input-37-6a79178c65db> in interpret_sentence_saliency(src, trg)
35 print(langs.shape)
36 print(langs)
---> 37 attribution_ig, delta = saliency.attribute(src_embedding, additional_forward_args=(langs, idx, False), target=max_idx)
38 attribution_igs.append(attribution_ig)
39
~/anaconda3/lib/python3.7/site-packages/captum/attr/_core/saliency.py in attribute(self, inputs, target, abs, additional_forward_args)
130 )
131 if abs:
--> 132 attributions = tuple(torch.abs(gradient) for gradient in gradients)
133 else:
134 attributions = gradients
~/anaconda3/lib/python3.7/site-packages/captum/attr/_core/saliency.py in <genexpr>(.0)
130 )
131 if abs:
--> 132 attributions = tuple(torch.abs(gradient) for gradient in gradients)
133 else:
134 attributions = gradients
TypeError: abs(): argument 'input' (position 1) must be Tensor, not NoneType
Seems like the gradient is NoneType.
Since Saliency causes the same issue, it's likely that the issue can be reproduced without Captum, simply with taking gradients with respect to inputs. Can you try something like this (with model_forward
as the function provided to Saliency) and see if you can obtain appropriate gradients?
src_embeddings.requires_grad_()
with torch.autograd.set_grad_enabled(True):
model_out = model_forward(src_embedding, langs, idx, False)
selected_out = model_out[:, max_idx]
grads = torch.autograd.grad(torch.unbind(selected_out), src_embeddings)
You are right. Using an Interpretable Embedding layer along with this code still gives all the gradients as NoneType. There are no errors but grads
is (None, )
. How can I avoid this?
Also, as a side note, I had to put allow_unused=True
in the torch.autograd.grad
function as it was giving 'RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
' without it. I had also done this before when the same error was popping up while using IntegratedGradients.
Based on the error, it seems like there's some issue with the model forward pass, essentially the output does not use / depend on src_embedding. It would be good to go through the model forward pass and confirm why this could be occurring.
The interpretable embedding module just returns the given input (assuming the input is an embedding output), so this may not work with models that perform modifications to the embedding input prior to the embedding layer. More details can be found in the discussion to this issue #439.
This is the model forward pass in my case.
So, I tried using two configuration for InterpretableEmbeddings, in one case I used it for all the 3 kinds, main embedding layer, positional embedding and the language embedding. The problem with this is that, as can be seen from the forward pass, the positional and language embeding layers take positional and language vectors as inputs, not the input (which would be the embedding in this case). The three embedding outputs are added after that, so the shapes don't match, giving an error.
The second configuration I used was just making the main embedding layer Interpretable, while the other two remain as they were. This returns the NoneType gradient. Which I can't seem to figure out why.
Can you once see the forward pass if you are able to observe anything from it?
Hi @RachitBansal , I looked into your code, it seems like the issue might be in the decoder, particularly in this line:
tensor = tensor[-1, :, :].data.type_as(src_enc) # (bs, dim)
Accessing the data directly no longer maintains the autograd dependency, which is likely causing the error. Removing the .data
attribute and accessing the tensor directly should maintain the autograd compute graph appropriately. It seems like this method may be intended primarily for inference and not back-propagation, so alternatively, you may be able to use the same decoder method used in training (seems like it may use forward directly rather than the generate method) to fix the issue.
You are absolutely right! It worked after changing the Decoder's inference step to scores = decoder('fwd', x=input_batch.cuda(), lengths=lengths2.cuda(), langs=langs2.cuda(), causal=True, src_enc=encoded, src_len=lengths.cuda(), find_emb=True, all_embs=all_embs)
.
Thanks a lot @vivekmig.
I have been trying to use captum to interpret my Low-Resource Neural Machine Translation model (specifically, XLM).
I am getting the following error when trying to run
IntegratedGradients.attribute
function:I am using the following arguments:
I tried printing
all_outputs
it shows[(None,)]
All my prediction and forward functions are working well and I thoroughly tested them seperately.
Please help.