Open ducpvftech opened 3 years ago
Hey @ducpvftech ,
the error seems to take place is the forward()
method of your allennlp
model, which is called through learner.model.forward_on_instance()
in cell 16 of your notebook.
Specifically, the embeddings computed by the model seem to be of different size. Does it have to do with the token being of different size?
I recommend you test your forward_fn()
with some baseline inputs to make sure it can handle the samples you are feeding.
Once forward_fn()
works without errors, you should face no issues applying Captum's attribue()
methods with your model and input.
Hope this helps
Hi @bilalsal ,
I tested my forward_fn()
,
def forward_fn(sample):
print("forward_fn: ", sample)
int2token = [learner.model.vocab._index_to_token["tokens"][i] for i in sample[0].numpy()]
token2str = " ".join(int2token)
tokens = [Token(s) for s in token2str.split()]
instance = learner.dataset_reader.text_to_instance(tokens)
intent_probs_tensor = torch.tensor(learner.model.forward_on_instance(instance)['intent_probs'])
probs_squeeze = intent_probs_tensor.unsqueeze(0)
print("probs_squeeze: ", probs_squeeze.shape)
return torch.softmax(probs_squeeze, dim=-1)
### test base-line
sample = [0, 0, 0, 0, 0, 0, 0]
sample = torch.tensor(sample)
sample = sample.unsqueeze(0)
forward_fn(sample)
The result is fine
forward_fn: tensor([[0, 0, 0, 0, 0, 0, 0]])
probs_squeeze: torch.Size([1, 20])
tensor([[0.0508, 0.0467, 0.0467, 0.0467, 0.0467, 0.0467, 0.0467, 0.0467, 0.0468,
0.0467, 0.0509, 0.0467, 0.0467, 0.0467, 0.0467, 0.1035, 0.0467, 0.0467,
0.0481, 0.0467]])
Now I get other error
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-43-61b7fe348def> in <module>()
----> 1 interpret_sentence(learner, 'xe day con mau vàng k sh', label=1)
<ipython-input-42-c330ba691776> in interpret_sentence(learn, sentence, min_len, label)
48 # print(f"target: {target}")
49 attributions_ig, delta = lig.attribute(input_indices, reference_indices, 0,
---> 50 n_steps=1, return_convergence_delta=True)
51
52
/home/nqtuan/miniconda3/envs/xai/lib/python3.6/site-packages/captum/attr/_core/layer/layer_integrated_gradients.py in attribute(self, inputs, baselines, target, additional_forward_args, n_steps, method, internal_batch_size, return_convergence_delta, attribute_to_layer_input)
358 method=method,
359 internal_batch_size=internal_batch_size,
--> 360 return_convergence_delta=False,
361 )
362
/home/nqtuan/miniconda3/envs/xai/lib/python3.6/site-packages/captum/attr/_core/integrated_gradients.py in attribute(self, inputs, baselines, target, additional_forward_args, n_steps, method, internal_batch_size, return_convergence_delta)
282 internal_batch_size=internal_batch_size,
283 forward_fn=self.forward_func,
--> 284 target_ind=expanded_target,
285 )
286
/home/nqtuan/miniconda3/envs/xai/lib/python3.6/site-packages/captum/attr/_utils/batching.py in _batched_operator(operator, inputs, additional_forward_args, target_ind, internal_batch_size, **kwargs)
162 )
163 for input, additional, target in _batched_generator(
--> 164 inputs, additional_forward_args, target_ind, internal_batch_size
165 )
166 ]
/home/nqtuan/miniconda3/envs/xai/lib/python3.6/site-packages/captum/attr/_utils/batching.py in <listcomp>(.0)
161 **kwargs
162 )
--> 163 for input, additional, target in _batched_generator(
164 inputs, additional_forward_args, target_ind, internal_batch_size
165 )
/home/nqtuan/miniconda3/envs/xai/lib/python3.6/site-packages/captum/attr/_core/layer/layer_integrated_gradients.py in gradient_func(forward_fn, inputs, target_ind, additional_forward_args)
341 # torch.unbind(forward_out) is a list of scalar tensor tuples and
342 # contains batch_size * #steps elements
--> 343 grads = torch.autograd.grad(torch.unbind(output), inputs)
344 return grads
345
/home/nqtuan/miniconda3/envs/xai/lib/python3.6/site-packages/torch/autograd/__init__.py in grad(outputs, inputs, grad_outputs, retain_graph, create_graph, only_inputs, allow_unused)
155 return Variable._execution_engine.run_backward(
156 outputs, grad_outputs, retain_graph, create_graph,
--> 157 inputs, allow_unused)
158
159
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Hi @ducpvftech ,
good to see the size mismatch is resolved.
As for the new error you are saying, maybe set requires_grad = True
to your tensors, before calling .attribute()
?
Hi @bilalsal , I solved that problem by creaed a new forward_fn
function
def forward_fn(sample):
print("forward_fn: ", sample)
int2token = [learner.model.vocab._index_to_token["tokens"][i] for i in sample[0].numpy()]
token2str = " ".join(int2token)
tokens = [Token(s) for s in token2str.split()]
instance = learner.dataset_reader.text_to_instance(tokens)
batch = next(iter(iterator([instance])))
#intent_probs_tensor = torch.tensor(learner.model.forward_on_instance(instance)['intent_probs'])
#probs_squeeze = intent_probs_tensor.unsqueeze(0)
outputs = learner.model(**batch)
# print(f"outputs: {outputs}")
intent_probs_tensor = outputs["intent_probs"]
print("probs_squeeze: ", intent_probs_tensor.shape)
return torch.softmax(intent_probs_tensor, dim=-1)
But now, I dont know how to combine 2 layers of embedding.
(text_field_embedder): BasicTextFieldEmbedder(
(token_embedder_token_characters): TokenCharactersEncoder(
(_embedding): TimeDistributed(
(_module): Embedding()
)
(_encoder): TimeDistributed(
(_module): CnnEncoder(
(_activation): ReLU()
(conv_layer_0): Conv1d(3, 128, kernel_size=(3,), stride=(1,))
)
)
)
(token_embedder_tokens): Embedding()
for example, token_embedder_tokens
layer after LayerIntegratedGradients
will get size [500, 7, 50] but token_embedder_token_characters
still [1, 7, 128] and they cannot concatenate.
@PhanDuc, I think that you probably can still get this working without changing forward function too much. What is the dimensionality of the output tensor ? Is the output dimensionality something like: #examples x # tokens x #classes ? You might need to specify target as a tuple. For the NER tag you will perform attribution for specific token at a time.
Hi @NarineK ,
The output tensor had the shape: #examples x # tokens x #embedding_shape
This is the notebook file what I'm trying so far, https://drive.google.com/file/d/1EeX074cJMDRMONjpTy42dUZxW0_VBer5/view?usp=sharing
This script works if n_steps=1
but if I change n_steps > 1
, there will be an error. :(
@ducpvftech, I've tried to run your example and I don't think I know exactly what version of spacy and other libraries you used. Could you, please, post the library requirements list?
If it works with n_steps and not with n_steps > 1, I guess that probably the first dimension doesn't correspond to number of examples. What is the dimensionality of torch.softmax(intent_probs_tensor, dim=-1)
?
Hi @NarineK , sorry for the errors in the notebook,
The dimensionality of torch.softmax(intent_probs_tensor, dim=-1)
is torch.Size([1, 20])
This is the notebook that I carefully check to make sure it is able to reproduce an error with LayerIntegratedGradients
https://drive.google.com/file/d/1EeX074cJMDRMONjpTy42dUZxW0_VBer5/view?usp=sharing
And when switched to IntegratedGradients
, I solved the problem with n_steps
https://colab.research.google.com/drive/1oFN5WwnrnvuGWIZajS6vLhG5sGi9MYlh?usp=sharing
But I still don't understand why LayerIntegratedGradients
got an error and how to fix that.
Also, how to including NER
to visualize_text
Thank you @NarineK !
@ducpvftech, the reason that it doesn't work with n_steps > 1 with LayerIG is that LayerIG internally expands the input. And the problem is here:
def forward_fn(sample):
print("forward_fn, sample: ", sample)
int2token = [learner.model.vocab._index_to_token["tokens"][i] for i in sample[0].cpu().data.numpy()]
token2str = " ".join(int2token)
tokens = [Token(s) for s in token2str.split()]
instance = learner.dataset_reader.text_to_instance(tokens)
batch = next(iter(iterator([instance])))
#intent_probs_tensor = torch.tensor(learner.model.forward_on_instance(instance)['intent_probs'])
#probs_squeeze = intent_probs_tensor.unsqueeze(0)
outputs = learner.model(**batch)
# print(f"outputs: {outputs}")
intent_probs_tensor = outputs["intent_probs"]
print("probs_squeeze: ", intent_probs_tensor.shape)
print(f"intent_probs_tensor - shape: {torch.softmax(intent_probs_tensor, dim=-1).shape}")
return torch.softmax(intent_probs_tensor, dim=-1)
In case of integrated gradients and layer integrated gradients the input gets expanded by the size of the n_steps. If sample has shape [1, 7] then it works but if sample has shape [n_steps, 7] then it doesn't because below code assumes that samples.shape = [1, 7]:
int2token = [learner.model.vocab._index_to_token["tokens"][i] for i in sample[0].cpu().data.numpy()]
token2str = " ".join(int2token)
tokens = [Token(s) for s in token2str.split()]
instance = learner.dataset_reader.text_to_instance(tokens)
If you make the changes / expansions accordingly, it should work.
Hi @NarineK , thank you for your adviced!
I made the change by expanding the shape to [n_steps, 7]
def forward_fn(sample):
if sample.shape[0] == 1:
### convert from [1, tokens] to [n_step, tokens]
# create a copy of data with shape [n_step, 1, tokens]
y = sample.unsqueeze(0).repeat(n_steps, 1, 1) # for example: y.shape = torch.Size([10, 1, 7])
# drop the dimension of size 1 -> [10, 7]
new_sample = torch.squeeze(y)
else:
new_sample = sample
all_probs = []
try:
int2tokens = [[onenet_learner.model.vocab._index_to_token["tokens"][w.item()] for w in token] for token in
new_sample]
except:
print("Error")
for int2_token in int2tokens:
token2str = " ".join(int2_token)
tokens = [Token(s) for s in token2str.split()]
# expanded the token with n_steps
instance = onenet_learner.dataset_reader.text_to_instance(tokens)
batch = next(iter(iterator([instance])))
outputs = onenet_learner.model(**batch)
intent_probs_tensor = torch.softmax(outputs["intent_probs"], dim=-1)
all_probs.append(intent_probs_tensor)
# convert to tensor
all_probs_nsteps = torch.cat(all_probs, dim=0)
return all_probs_nsteps
But it doesn't work, this is the reason:
file: /home/miniconda3/envs/xai/lib/python3.6/site-packages/allennlp/modules/text_field_embedders/basic_text_field_embedder.py - line 125
return torch.cat(embedded_representations, dim=-1)
embedded_representations[0].shape has shape:
torch.Size([1, 7, 128])`
and
embedded_representations[1].shape has shape: torch.Size([10, 7, 50])
The different output between 2 embedding layers :
TokenCharactersEncoder(
(_embedding): TimeDistributed(
(_module): Embedding()
)
(_encoder): TimeDistributed(
(_module): CnnEncoder(
(_activation): ReLU()
(conv_layer_0): Conv1d(3, 128, kernel_size=(3,), stride=(1,))
)
)
)
and
Embedding()
Hi @ducpvftech ,
is there a way to warrant that all embedded_representations
have the same shape? Would padding the inputs to equal lengths make sense for your use case?
Hi @bilalsal ,
There are 2 ways to do it.
site-packages/allennlp/modules/text_field_embedders/basic_text_field_embedder.py
or
Hi @ducpvftech, I have been trying the code and it feels unusual. If I set n_steps=1 in your code snippet, it fails.
To be honest, I don't understand why do we need to call batch = next(iter(iterator([instance])))
in the forward function. Shouldn't this happen outside ?
I think what I don't understand is that in order to perform forward pass we need both sample
and the batch
in the forward_fn
.
I feel that we are trying to make it more complex than it is. Could you, please, provide me a simple colab notebook that performs simple predict for which you'd like to compute feature importance scores.
Hi @NarineK , this is a simple colab for simple predict:
https://drive.google.com/file/d/1yc_vngmegzPmmgHAb3VV21r61xLZTjne/view?usp=sharing
(please don't forget You must restart the runtime in order to use newly installed versions.
)
I re-run the code in the previous notebook Layer Intergrade Gradient
with n_steps=1
without any problem:
https://drive.google.com/file/d/1EeX074cJMDRMONjpTy42dUZxW0_VBer5/view?usp=sharing
Thank you for helping me with this issue @NarineK !
Thank you, @ducpvftech! I think that the challenge is how to convert token indices into instance type. If we cannot do that natural way then I think it is better to use IntegratedGradients instead of LayerIntegratedGradients because the inputs to the attribute must be tensors, other datatypes are not supported.
input_indices = torch.tensor([learn.model.vocab._token_to_index["tokens"][word] for word in tokens])
I see that here you created input_indices
and didn't use it. Does it mean that we cannot convert input_indices
into an instance type naturally ?
https://colab.research.google.com/drive/1oFN5WwnrnvuGWIZajS6vLhG5sGi9MYlh?usp=sharing
Basically, I want to pass precomputed input_indices
instead of internally re-computing them.
Converting input_indices
back to text most probably won't help because the gradients will not get propagated accordingly.
Hi @NarineK, thank you for your feedback and sorry for the late response,
In the link Colab you sent, I don't think using input_indices
will make any difference since I passed an embedding to forward function, this will work without any problem with any input sentence.
instance = learn.dataset_reader.text_to_instance(tokens_allennlp)
batch = next(iter(iterator([instance])))
print("input_indices: ", input_indices)
input_embedding = interpretable_embedding.indices_to_embeddings(batch["tokens"])
And in the past, I have tried using input_indices
only` but only got an error.
So, I think using IntegratedGradients
suits me well.
In my case, this model return multi-output (intent & ner).
How I can make IntegratedGradients
works for both of them? I mean, there will have an explanation for Intent and an explanation for Ner ???
Hi @ducpvftech, yes, I understand that in your example you get it working without input_indices
. I brought it up as an example to explain LayerIG. LayerIG needs to take torch tensors as input for the inputs that we want to interpret.
W.r.t. the NER, is ner prediction score stored in tag_logits
?
I think there are two option.
ig.attribute
call and a slightly modified forward function that returns NER
score.
tag_logits
has dimension : 1 x num_tokens x num_embeddings, right ? I think we can sum across num_embeddings dimension and attribute to each tag. Basically, your modified forward function will return summed score for each token one at a time in a loop or you could also do that in a batch. Loop would be easier for the first try.Does this make sense ? Thank you, Narine
Thank you @NarineK ,
tag_logits
has dimension: num_token x 57
, 57 is the (embeddings_size)
It is a good idea to compute both intent and ner at once. Both of your options are good for me to start! :)
I will get back to you soon ;)
Hi everyone,
I'm trying to use Camptum with OneNet using Allennlp, this is the model structure
And this is a sample output
I got an error when used
LayerIntegratedGradients
,Here is the notebook : https://gist.github.com/ducpvftech/1cdb03429a7b9dbf7036d5c4c511ec45
Can you guide me on how to make this work for both intent classification and ner ?
Thank you for considering my request.