Closed davidefiocco closed 4 years ago
@davidefiocco , from the visualization I can see that attribution label isn't visualized correctly. I think maybe you are not attributing to the correct output target index. What is the target index that you use ? Is there any ?
score_vis = viz.VisualizationDataRecord(
attributions_sum,
torch.max(torch.softmax(score[0][0], dim=0)),
torch.argmax(score[0][0]), # why is this identical with the next class
torch.argmax(score[0][0]),
<THIS SHOULD BE THE TARGET INDEX IF THERE IS NO THEN IT SHOULD BE NONE >,
attributions_sum.sum(),
all_tokens,
delta)
Oh, just saw attached notebook, I think that the problem is in the output. I think that you might need to provide a target. You might not have to do:
def custom_forward(inputs):
out = model(inputs)[0][0]
return out
but perhaps access the right target from there. My example that I gave you was for my model which is probably very different from yours
In my case it is this:
def custom_forward(inputs):
out = model(*inputs)[0]
return out
You need to know what do you attribute to what. Do you attribute the cat
or the dog
to the inputs? target
is the index of the class - cat index
or dog index
. In your case you can adjust the classes. See how we use target:
https://captum.ai/tutorials/Multimodal_VQA_Interpret
Thanks @NarineK for your reply! tl;dr I tried to follow your suggestion and looked also into https://captum.ai/tutorials/IMDB_TorchText_Interpret but I am struggling a bit with the results and I'll leave here some notes to clarify the problem.
I have a binary classification problem (CoLA task), and I would like to see in captum which features bring the (softmax) score of class 0 (i.e. ungrammatical sentence) up. Ideally, the interpretation should highlight features that make the sentence not grammatical.
I revised the code and it's visible at https://gist.github.com/davidefiocco/3e1a0ed030792230a33c726c61f6b3a5
The behavior of my model is such that for for some input_ids
created from the (ungrammatical) sentence
"These tests does not work as expected."
I get a tuple
> model(input_ids)
(tensor([[ 4.5192, -4.2522]], grad_fn=<AddmmBackward>),)
which contains the logits ("ungrammatical" and "grammatical" respectively) of my binary classifier.
To get the logits only, I created also have a predict
method as follows:
def predict(inputs):
return model(inputs)[0]
so that
> predict(input_ids)
tensor([[ 4.5192, -4.2522]], grad_fn=<AddmmBackward>)
To use the output of the model in captum, I thus have a (modified) custom forward that computes the softmax value for the "ungrammatical" class and now reads:
def custom_forward(inputs):
preds = predict(inputs)
return torch.softmax(preds, dim = 1)[0][0].unsqueeze(-1) # second [0] because I care about score of 0th class
so that
> custom_forward(input_ids)
tensor([0.9998], grad_fn=<UnsqueezeBackward0>)
I would like to see which features of my input drive made that score (aka the score of my 0th class) close to 1.
After computing attributions and summarizing them, I invoke viz.VisualizationDataRecord
, but I am not sure I am doing this all right:
score = predict(input_ids) # thus equal to tensor([[ 4.5192, -4.2522]], grad_fn=<AddmmBackward>)
score_vis = viz.VisualizationDataRecord(
attributions_sum,
torch.softmax(score, dim = 1)[0][0], # probability of my target class
torch.argmax(torch.softmax(score, dim = 1)[0]), # the predicted class
0, # not sure what I should put here
text,
attributions_sum.sum(),
all_tokens,
delta)
Not sure if someone now sees something which is obviously not OK with all of the above. Full version of the code is at https://gist.github.com/davidefiocco/3e1a0ed030792230a33c726c61f6b3a5
Hi @davidefiocco , yes, you can do it that way as well. If you only care about the 0(ungrammatical) class. Our visualization functions are just examples. You can also printout the scores and see if the visualization matches the score. The higher the score the more important is the feature for your output. Also, perhaps you want to compute attribution without applying the softmax. Because of softmax you might get a relatively small range of attribution scores.
I removed the softmax from the custom_forward
to make it read
def custom_forward(inputs):
preds = predict(inputs)
return preds[0][0].unsqueeze(-1)
The problem still persists though, as my interpretations change little with my input, while the score wildly varies :(
"These test fails."
custom_forward(input_ids) = tensor(-3.5023], grad_fn=<UnsqueezeBackward0>)
"These tests fail"
custom_forward(input_ids) = tensor([4.2576], grad_fn=<UnsqueezeBackward0>)
@davidefiocco , the attribution could be correct but the visualization is tricky because I can see that the first example has much larger negative attribution. The word fails
is positive but it is probably very small positive value. Attribution is not perfect and it looks like it is a bit confused with word fails
When we normalize attributions then the color coding also gets adjusted. You can try to normalize the attribution across those 2 samples. Also looking into the distribution of attributions for each embedding vector can be helpful.
The normalization happens here:
attributions = attributions / torch.norm(attributions)
I'd try without normalization or try a different normalization technique.
@davidefiocco, have you made any progress here ?
Hi @NarineK thanks a lot for following up here! I had tried eliminating normalization, but that didn't solve it. I'll try to give it another shot and will update the issue here with my findings.
Hi @NarineK , I hope you're well, apologies for the delay.
I could not solve the problem, but I am more convinced that there is some bug in my code now.
The good news is that now my code is perfectly reproducible at least, as for debugging I borrowed a (binary) sentiment classifier whose weights are publicly downloadable via the HuggingFace great collection at https://huggingface.co/lvwerra/bert-imdb (trained on IMDB)
Sentiment can be reversed simply by changing one word, and so it should be easier to debug.
The notebook for now is runnable without modifications by making a copy of what you find at https://colab.research.google.com/drive/1snFbxdVDtL3JEFW7GNfRs1PZKgNHfoNz If one doesn't have/want to use a Google account to run colab, the notebook should be executable locally on a linux machine having wget and pip installed.
The resulting visualization is
which seems buggy to me. I'll try to look at the code more closely, but if you have hints that would be totally welcome!
P.S.: I put the same code on https://gist.github.com/davidefiocco/40a1395e895174a4e4d3ed424a5d388a also, for reference.
@davidefiocco , try to change the text a little bit and see how your attribution changes. This is actually a very interesting example and it shows that the network is prone to adversarial attacks and is possibly easy to fool.
Try to remove the dot at the end of the sentence and see how the attribution changes.
Now, try something like this:
text = "The movie was one of those amazing movies you can not forget"
or this:
text = "The movie was one of those amazing movies"
The attribution starts to make more sense ?
Try also to increase n_steps and observe the delta:
attributions, delta = lig.attribute(inputs=input_ids,
baselines=ref_input_ids,
n_steps=700,
internal_batch_size=3,
return_convergence_delta=True)
delta
from what I see you are attributing to the negative class. Positive (good recommendation) class would be ?
def custom_forward(inputs):
preds = predict(inputs)
return torch.softmax(preds, dim = 1)[0][1].unsqueeze(-1)
Hi @NarineK, thanks!
I tried to change only the input (by shortening the sentence) and attribute the negative label (so no other change in the notebook, just the input). I get:
And this is what I get with a similar, but negative "micro-review":
I computed attributions and convergence deltas for various values of n_steps, and by eyeballing the results convergence seems reached at ~200:
Increasing the delta beyond the default value of 50 doesn't dramatically change the interpretation I visualize though.
Also as for your question, indeed the custom forward returning the softmax score for the positive class would be written as
def custom_forward(inputs):
preds = predict(inputs)
return torch.softmax(preds, dim = 1)[0][1].unsqueeze(-1)
But also playing with that definition doesn't yield results that make sense intuitively to me (I would like the interpretation to be focused on the adjective, which sets the sentiment of the review).
Sorry, closed by accident! I attributed to the the good recommendation class that was predicted with 0.96 probability and this is what I get.
I'm seeing different results than what you see. Does this make sense ?
Ha!
Mmm I am not convinced :) the correct spelling in English for the sentence is
"The movie was one of those amazing movies"
You used instead
"The movie was one of those amaizing movies"
If I modify the custom forward to give the softmax score for the positive class i.e.
def custom_forward(inputs):
preds = predict(inputs)
return torch.softmax(preds, dim = 1)[0][1].unsqueeze(-1)
I get for the first and second sentences respectively 0.9989 (so a very positive rating) and 0.0413 ("amaizing" is not an English word but the model goes for a negative sentiment).
I do visualize with
score_vis = viz.VisualizationDataRecord(attributions_sum,
torch.softmax(score, dim = 1)[0][1],
torch.argmax(torch.softmax(score, dim = 1)[0]),
0,
text,
attributions_sum.sum(),
all_tokens,
delta)
but the viz is still a bit different from yours, and looks like
Can you try rerunning with the "amazing" :D spelling?
From my attempts that Huggingface model is pretty good at getting correct the sentiment score. Looks to me that my attributions (or the viz) don't quite cut it.
Oh, what are the classes. I think that I got confused on the task. 0 -> the index of misspelled 1 -> The index of correctly spelled.
Is this a spelling correctness model or a movie recommendation model ?
This is what I get when I attribute to class 1
Hi @NarineK thanks for looking into this!
I switched to another binary classifier, a sentiment one trained on IMDB, as weights are easy to find online. See above what I write my https://github.com/pytorch/captum/issues/311#issuecomment-606912060
Your interpretation looks totally legit to me, brilliant! Can you share how you changed from my notebook in order to get it?
I understand that the custom forward should read
def custom_forward(inputs):
preds = predict(inputs)
return torch.softmax(preds, dim = 1)[0][1].unsqueeze(-1)
But then how do you get the rest (e.g. attribution and visualization)? Can you post snippets of the diffs?
Hi @davidefiocco, sorry I forgot to reply. Can you access this link ? https://colab.research.google.com/drive/1Lw3JTZio03VwPvSVFzLJmZ52oBRpo9ZM Let me know if you can't access it.
Hi @NarineK, I could access and run your code and tried a couple of positive sentences as examples and it seems to work OK, thanks a million! I have saved a "frozen" copy of it in https://gist.github.com/davidefiocco/47137f6eb7e3351c9bac4580c2ccc9d4 as I understand you may work on that in the future.
Here's some follow-up:
From what I see, when attributing it's key to play with the parameter internal_batch_size=3
, (together with a high number of n_steps
). Can you explain why that may be the case?
Will your representation work to highlight negative sentiment as well? With your code, changing the sentence to negative by changing the adjective I get for which I would have expected a stronger negative (i.e. stronger red) attribution to the adjective.
How can I attribute the negative (0) label? I guess I need to change the custom forward into
def custom_forward(inputs):
preds = predict(inputs)
return torch.softmax(preds, dim = 1)[0][0].unsqueeze(-1) # changed here
and the viz syntax
score_vis = viz.VisualizationDataRecord(attributions_sum,
torch.softmax(score, dim = 1)[0][0], # changed here
torch.argmax(torch.softmax(score, dim = 0)[0]),
0, # changed here
text,
attributions_sum.sum(),
all_tokens,
delta)
can you confirm this? I tried to attribute a negative example (simply by putting a negative adjective in the sentence to evaluate) but to no avail.
Thanks!
Hi @davidefiocco ,
internal_batch_size is purely for memory management reasons. It will make sure that small chunks of inputs as processed at the time. n_steps
is the number of integral approximation steps. The higher that number the better the approximation is but if you set it too high you need to adjust memory with internal_batch_size
and it will take longer to execute.
Yes! That's right
custom_forward
function looks right. torch.argmax(torch.softmax(
is here tricky. You need to make sure that the negative score has the max value or not take the max. Other than that looks right to me.
Hi @NarineK !
On point 1. though, I would like to flag that with constant n_steps
, changing the internal_batch_size
seem to affect the results. With
attributions, delta = lig.attribute(inputs=input_ids,
baselines=ref_input_ids,
n_steps=300,
internal_batch_size=1,
return_convergence_delta=True)
I get
But with the exact same code, setting internal_batch_size=3
, the result of the viz is instead
On point 3. can you give more extra hints on what should be the third argument of viz.VisualizationDataRecord
to attribute the negative (0) class?
Hi @davidefiocco,
That's interesting ! Thank you for bringing up point 1. It might be a bug. We'll take a look into it. I think that instead of doing argmax you should use for negative
torch.softmax(score, dim = 1).squeeze(0)[0]
and for positive
torch.softmax(score, dim = 1).squeeze(0)[1]
What do you think ?
Hi @NarineK, thanks again.
Dealing with the negative label and passing a sentiment-negative sentence with
def custom_forward(inputs):
preds = predict(inputs)
return torch.softmax(preds, dim = 1)[0][0].unsqueeze(-1) # changed here
and
# storing couple samples in an array for visualization purposes
score_vis = viz.VisualizationDataRecord(attributions_sum,
torch.softmax(score, dim = 1)[0][0], # changed here
torch.softmax(score, dim = 1).squeeze(0)[0],
0, # changed here
text,
attributions_sum.sum(),
all_tokens,
delta)
Gives me this visualization in which I would have expected the word "terrible" show up with a greener-shade-of-green.
The word importance thus doesn't look very convincing to me :(
About the difference in results when varying the internal_batch_size
parameter, let me know if I should file/refer to a separate issue!
Hi @davidefiocco , the custom_func was unnecessarily complicated and it led to problems with different batch sizes. Selecting first element and un-squeezing was messing things up: Here is a simpler version:
def custom_forward(inputs):
preds = predict(inputs)
return torch.softmax(preds, dim = 1)[:, 0] # for negative attribution,
#return torch.softmax(preds, dim = 1)[:, 1] <- for positive attribution
In terms of your example with terrible movies, you can get better attribution if you increase the number of steps as I show in the notebook below. https://colab.research.google.com/drive/1Lw3JTZio03VwPvSVFzLJmZ52oBRpo9ZM
Note that attribution is not a perfect tool. It can help you to better understand feature importance but it is not solving all problems of feature importance.
Hope this helps!
@davidefiocco , can we close this issue if we are done with the questions ? Thank you!
Sure, thanks!
It may be useful to consolidate all of the above in a tutorial example, as this could save to other users some time.
Hello @NarineK,
Would you mind re-sharing the notebook below - would really appreciate it. Seems that the link is broken.
Thank you.
In terms of your example with terrible movies, you can get better attribution if you increase the number of steps as I show in the notebook below. https://colab.research.google.com/drive/1Lw3JTZio03VwPvSVFzLJmZ52oBRpo9ZM
Hello @NarineK,
Would you mind re-sharing the notebook below - would really appreciate it. Seems that the link is broken.
@R-icntay seems that good old me treasured this in a gist: https://gist.github.com/davidefiocco/47137f6eb7e3351c9bac4580c2ccc9d4 but it lacks additional improvements that were performed later in the discussion. It'd be great if @NarineK could have a look and/or reshare the notebook!
@davidefiocco, Thanks a lot - Honestly!! I was trying to follow through your discussion and could not find some of it that matches the code. I also found this: https://colab.research.google.com/drive/1pgAbzUF2SzF0BdFtGpJbZPWUOhFxT2NZ#scrollTo=X-nyyq_tbUDa
as mentioned here: https://github.com/pytorch/captum/issues/150#issuecomment-665662534
Could that help?
And honestly, @davidefiocco @NarineK , this was a great thread. Should have made it into a blog post or something 🙂. So thank you all!
I am training a BERT trained on the CoLA task (see also https://github.com/pytorch/captum/issues/303), so that I can classify sentences as grammatically acceptable or not. Take this example:
"These tests don't work as expected" (grammatically acceptable) "These tests doesn't work as expected" (unacceptable)
If I run the two examples through the classifier I get as score through softmax in the binary classifier 0.99395967 and 0.00011.
In the two cases, I get interpretations via the notebook in the gist https://gist.github.com/davidefiocco/3e1a0ed030792230a33c726c61f6b3a5 (adapted from the SQuAD example in this repo). And in particular creating the visualization with
In the two cases I get
The interpretations look extremely similar in spite of the dramatic change in score, while I would expect the interpretation to change and the model to focus on the grammar mistake (e.g. by focusing on the verb, or the noun)
Am I doing something wrong in my implementation (likely!), or is
LayerIntegratedGradients
not performing great in this example? Can someone suggest viable alternatives to try out?