pytorch / captum

Model interpretability and understanding for PyTorch
https://captum.ai
BSD 3-Clause "New" or "Revised" License
4.87k stars 490 forks source link

AssertionError: Elements to be reduced can only beeither Tensors or tuples containing Tensors. #445

Closed RachitBansal closed 4 years ago

RachitBansal commented 4 years ago

I have been trying to use captum to interpret my Low-Resource Neural Machine Translation model (specifically, XLM).

I am getting the following error when trying to run IntegratedGradients.attribute function:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-42-3042e38801c6> in <module>
----> 1 interpret_sentence(input_text, ground_truth)

<ipython-input-40-6d6b70739f3f> in interpret_sentence(src, trg)
     35         print(langs.shape)
     36         print(langs)
---> 37         attribution_ig, delta =  ig.attribute(src_embedding, baselines=202, additional_forward_args=(langs, idx, False), target=max_idx, n_steps=50, return_convergence_delta=True)
     38         attribution_igs.append(attribution_ig)
     39 

~/anaconda3/lib/python3.7/site-packages/captum/attr/_core/integrated_gradients.py in attribute(self, inputs, baselines, target, additional_forward_args, n_steps, method, internal_batch_size, return_convergence_delta)
    282             internal_batch_size=internal_batch_size,
    283             forward_fn=self.forward_func,
--> 284             target_ind=expanded_target,
    285         )
    286 

~/anaconda3/lib/python3.7/site-packages/captum/attr/_utils/batching.py in _batched_operator(operator, inputs, additional_forward_args, target_ind, internal_batch_size, **kwargs)
    166     ]
--> 167     return _reduce_list(all_outputs)

~/anaconda3/lib/python3.7/site-packages/captum/attr/_utils/batching.py in _reduce_list(val_list, red_func)
     65         for i in range(len(val_list[0])):
     66             final_out.append(
---> 67                 _reduce_list([val_elem[i] for val_elem in val_list], red_func)
     68             )
     69     else:

~/anaconda3/lib/python3.7/site-packages/captum/attr/_utils/batching.py in _reduce_list(val_list, red_func)
     69     else:
     70         raise AssertionError(
---> 71             "Elements to be reduced can only be"
     72             "either Tensors or tuples containing Tensors."
     73         )

AssertionError: Elements to be reduced can only beeither Tensors or tuples containing Tensors.

I am using the following arguments:

ig.attribute(src_embedding, baselines=1, additional_forward_args=(langs, idx, False), target=max_idx, n_steps=50, return_convergence_delta=True)

I tried printing all_outputs it shows [(None,)]

All my prediction and forward functions are working well and I thoroughly tested them seperately.

Please help.

vivekmig commented 4 years ago

Hi @RachitBansal , from quick look at the stack trace, it seems like the backward pass isn't returning appropriate results and just returning None for some reason.

To debug this further, can you try using Saliency, which just takes gradients at the input, and see if a similar issue is obtained? It should take the same arguments to attribute as IntegratedGradients except baselines, n_steps, and return_convergence_delta.

If that works appropriately, the issue might be related to taking gradients with respect to the expanded inputs in Integrated Gradients.

If you can share your code to reproduce this issue in a Colab notebook or other form, we can try to help debug further.

RachitBansal commented 4 years ago

Hi @vivekmig, thank you for your response,

If you can share your code to reproduce this issue in a Colab notebook or other form, we can try to help debug further.

Here is the Colab notebook where I have uploaded my code, you should be able to reproduce the issue if you run it along with this zip folder (after unzipping), which contains all the external python files and weights.

To debug this further, can you try using Saliency, which just takes gradients at the input, and see if a similar issue is obtained? It should take the same arguments to attribute as IntegratedGradients except baselines, n_steps, and return_convergence_delta.

I will definitely try this and get back to you, while you see into that code.

I have been trying to get this working for a long time but to no avail. Would really appreciate your help.

vivekmig commented 4 years ago

Hi @RachitBansal , I'm not able to access the notebook or zip, can you check the permissions to share them?

RachitBansal commented 4 years ago

zip notebook

Can you please try now?

Also, the problem might be in the fwd function inside the TransformerModel class in the translation.XLM.XLM.src.model.transformerIn python file.

RachitBansal commented 4 years ago

After running for Saliency:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-38-7a10faecc7e7> in <module>
----> 1 interpret_sentence_saliency(input_text, ground_truth)

<ipython-input-37-6a79178c65db> in interpret_sentence_saliency(src, trg)
     35         print(langs.shape)
     36         print(langs)
---> 37         attribution_ig, delta =  saliency.attribute(src_embedding, additional_forward_args=(langs, idx, False), target=max_idx)
     38         attribution_igs.append(attribution_ig)
     39 

~/anaconda3/lib/python3.7/site-packages/captum/attr/_core/saliency.py in attribute(self, inputs, target, abs, additional_forward_args)
    130         )
    131         if abs:
--> 132             attributions = tuple(torch.abs(gradient) for gradient in gradients)
    133         else:
    134             attributions = gradients

~/anaconda3/lib/python3.7/site-packages/captum/attr/_core/saliency.py in <genexpr>(.0)
    130         )
    131         if abs:
--> 132             attributions = tuple(torch.abs(gradient) for gradient in gradients)
    133         else:
    134             attributions = gradients

TypeError: abs(): argument 'input' (position 1) must be Tensor, not NoneType

Seems like the gradient is NoneType.

vivekmig commented 4 years ago

Since Saliency causes the same issue, it's likely that the issue can be reproduced without Captum, simply with taking gradients with respect to inputs. Can you try something like this (with model_forward as the function provided to Saliency) and see if you can obtain appropriate gradients?

src_embeddings.requires_grad_()
with torch.autograd.set_grad_enabled(True):
    model_out = model_forward(src_embedding, langs, idx, False)
    selected_out = model_out[:, max_idx]
    grads = torch.autograd.grad(torch.unbind(selected_out), src_embeddings)
RachitBansal commented 4 years ago

You are right. Using an Interpretable Embedding layer along with this code still gives all the gradients as NoneType. There are no errors but grads is (None, ). How can I avoid this?

Also, as a side note, I had to put allow_unused=True in the torch.autograd.grad function as it was giving 'RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.' without it. I had also done this before when the same error was popping up while using IntegratedGradients.

vivekmig commented 4 years ago

Based on the error, it seems like there's some issue with the model forward pass, essentially the output does not use / depend on src_embedding. It would be good to go through the model forward pass and confirm why this could be occurring.

The interpretable embedding module just returns the given input (assuming the input is an embedding output), so this may not work with models that perform modifications to the embedding input prior to the embedding layer. More details can be found in the discussion to this issue #439.

RachitBansal commented 4 years ago

This is the model forward pass in my case.

So, I tried using two configuration for InterpretableEmbeddings, in one case I used it for all the 3 kinds, main embedding layer, positional embedding and the language embedding. The problem with this is that, as can be seen from the forward pass, the positional and language embeding layers take positional and language vectors as inputs, not the input (which would be the embedding in this case). The three embedding outputs are added after that, so the shapes don't match, giving an error.

The second configuration I used was just making the main embedding layer Interpretable, while the other two remain as they were. This returns the NoneType gradient. Which I can't seem to figure out why.

Can you once see the forward pass if you are able to observe anything from it?

vivekmig commented 4 years ago

Hi @RachitBansal , I looked into your code, it seems like the issue might be in the decoder, particularly in this line: tensor = tensor[-1, :, :].data.type_as(src_enc) # (bs, dim) Accessing the data directly no longer maintains the autograd dependency, which is likely causing the error. Removing the .data attribute and accessing the tensor directly should maintain the autograd compute graph appropriately. It seems like this method may be intended primarily for inference and not back-propagation, so alternatively, you may be able to use the same decoder method used in training (seems like it may use forward directly rather than the generate method) to fix the issue.

RachitBansal commented 4 years ago

You are absolutely right! It worked after changing the Decoder's inference step to scores = decoder('fwd', x=input_batch.cuda(), lengths=lengths2.cuda(), langs=langs2.cuda(), causal=True, src_enc=encoded, src_len=lengths.cuda(), find_emb=True, all_embs=all_embs).

Thanks a lot @vivekmig.