Open nataliebarcickikas opened 2 years ago
Hi @nataliebarcickikas - it seems like the error fails during the forward call to batch_predict
. What happens if you call batch_predict
directly, without using LayerIntegratedGradients
?
Directly calling batch_predict
causes no issues:
batch_predict(encoding['input_ids'],
encoding['image'],
encoding['bbox'],
encoding['attention_mask'],
encoding['token_type_ids'])
Output:
tensor([[0.4553, 0.5447]], grad_fn=<SoftmaxBackward>)
Could you print out the dimensions of all the embeddings in that line 756 for the forward call to batch_predict
as well as to attr
?
I print the dimensions along with the four individual components of it:
In the forward call:
batch_predict(encoding['input_ids'],
encoding['image'],
encoding['bbox'],
encoding['attention_mask'],
encoding['token_type_ids'])
Input embeddings: torch.Size([1, 44, 768])
Position embeddings: torch.Size([1, 44, 768])
Spatial position embeddings: torch.Size([1, 44, 768])
Token type embeddings: torch.Size([1, 44, 768])
torch.Size([1, 44, 768])
tensor([[0.4942, 0.5058]], grad_fn=<SoftmaxBackward>)
In the attr call:
attributions = attr.attribute(inputs=encoding['input_ids'],
additional_forward_args=(encoding['image'],
encoding['bbox'],
encoding['attention_mask'],
encoding['token_type_ids']),
baselines=baselines,
target=answer_idx,
n_steps=1)
Input embeddings: torch.Size([1, 44, 768])
Position embeddings: torch.Size([1, 44, 768])
Spatial position embeddings: torch.Size([1, 44, 768])
Token type embeddings: torch.Size([1, 44, 768])
torch.Size([1, 44, 768])
Input embeddings: torch.Size([1, 44, 768])
Position embeddings: torch.Size([1, 44, 768])
Spatial position embeddings: torch.Size([1, 44, 768])
Token type embeddings: torch.Size([1, 44, 768])
torch.Size([1, 44, 768])
Input embeddings: torch.Size([1, 44, 768])
Position embeddings: torch.Size([1, 49, 768])
Spatial position embeddings: torch.Size([1, 49, 768])
Token type embeddings: torch.Size([1, 44, 768])
And then the runtime error occurs:
RuntimeError: The size of tensor a (44) must match the size of tensor b (49) at non-singleton dimension 1
(I left this comment originally on #904)
I've this exact same issue with a custom Bert based model and I've traced it back to the Captum hook being called in line 1072 of the source code for torch.nn.Module
(i.e. during the forward call of you model). The hook being called that causes this issue is layer_integrated_gradients.layer_forward_hook
. It appears that the cached value in scattered_inputs_dict
is being returned no matter what because the hook is being called at the start of the wrapper module's forward method, and not being reset mid call if weights are shared.
# num_current_tokens = 50, num_prev_tokens = 71
input_ids.shape # torch.Size([1, 50])
self.word_embeddings(input_ids).shape # torch.Size([1, 71, 768]) instead of torch.Size([1, 50, 768])
For context, my model shares weights and we call the same model twice within forward():
def _forward(...):
....
outputs_text = self.bert(input_ids=input_ids_text, attention_mask=attention_mask_text, **kwargs)
outputs_context = self.bert(input_ids=input_ids_context, attention_mask=attention_mask_context, **kwargs)
...
return outputs
@99warriors do you guys have a time estimate? If not I'm happy to fork and go from a starting point
@chrisdoyleIE Thank you for investigating this. We have had discussions over how to fix this problem (perhaps expand scattered_inputs_dict
to cache the result of multiple forward calls of the same module), but this discussion is still on-going. Any suggestions you had would be most helpful / welcome!
For now, we would recommend, as a work-around, to avoid calling the same module multiple times within the same forward pass (and instead creating copies of the module, which can all share weights), and adding a warning related to this is an immediate task we can tackle.
That'll do, many thanks!
Is there any update regarding this topic?
I have a similar issue: I call the embedding function in my model multiple times to split an input into several chunks (it's a hierarchical model9 where I have to get [CLS]/[PAD]/[SEP] embeddings in between:
...
sep_embed = self.bert.embeddings(torch.tensor([[4]], dtype=torch.long, device=self.device))[0][0]
pad_embed = self.bert.embeddings(torch.tensor([[0]], dtype=torch.long, device=self.device))[0][0]
cls_embed =self.bert.embeddings(torch.tensor([[3]], dtype=torch.long, device=self.device))[0][0]
...
Then, of course, I get wrong shapes from the scattered_inputs_dic
(or saved_layer
?).
I am working with the LayoutLMv2 model in huggingface (https://huggingface.co/transformers/model_doc/layoutlm.html). Works fine with performing a forward pass, but get a dimensionality error related to the embeddings when I try to use it in Captum for explainability. Note that LayoutLM (first version of the model) gives no issues in the same context. Also, I realize that this model needs to be finetuned. This is just supposed to be a proof-of-concept usage.
Here is my code:
And the error:
These are the versions of the packages I am using: transformers==4.11.2
captum=0.4.0 torch==1.7.0
torchvision==0.8.1