pytorch / captum

Model interpretability and understanding for PyTorch
https://captum.ai
BSD 3-Clause "New" or "Revised" License
4.96k stars 499 forks source link

A model's feature attributions' consistency #797

Closed ivicts closed 2 years ago

ivicts commented 3 years ago

❓ Questions and Help

Hi,

I have one model tat I train five different times using different seed each times. I trained all the models using the exact same train, val, and test splits. The five models have roughly similar metrics performance on val and test set. I want to see the feature attribution’s consistency across different seeds. I would expect to get something like this:

MicrosoftTeams-image (17)

However, this is what I got using after plotting the heatmaps using the feature attributions “attr_test_norm_sum” calculated using DeepLift:

test_feat.requires_grad_()
method = DeepLift(net)
attr_test = method.attribute(test_feat)
attr_test_sum = attr_test.detach().numpy().sum(0)
attr_test_norm_sum = attr_test_sum/ np.linalg.norm(attr_test_sum, ord=1)

image

I have also tried using other method (GradientShap, IntegratedGradient) the result is the same. It seems like all the five models have different ways of attributing the important features. Is this expected? Did I do something wrong? Do I need to normalize the feature attributions in some ways?

This might be related to: https://github.com/pytorch/captum/issues/752 Thank you!

bilalsal commented 3 years ago

Hi @ivicts ,

yes, I would also expect the five models to have somewhat different attribution results given that they have likely learned somewhat different weights. We can imagine each model converging to its own local minimum. That said, it is also likely that the five models have developed similar internal features, and hence assign similar importance scores to some of the input features. In fact, the heatmap you provide is abundant of horizontal blue stripes, where all five models assign a positive score to the same feature. Also, the red stripes around features #128-136 suggest that most of the models assign negative attribution to these features, when processing the input you provided.

One last suggestion: It would also be informative to generate weight plots, where the x axis corresponds to your five models and the y axis correspond to the weight of each model (say. for a specific layer or combining the weights of all layers). That will give you an idea about how similar the learned representations are.

Hope this helps

ivicts commented 3 years ago

Hi @bilalsal,

Thank you for your reply.

I am glad that it seems that you think my heatmap make sense because the models may converge to their own local minimum. Btw, for the heatmaps, I just normalizing it as a unit vector, but this may make each column in the heatmaps has different scales. Would this be an issue? If yes, how to fix it?

For weight plot, we can access a pytorch layer weight by: model.layer.weight.data which is a matrix. Did you mean I should plot this weight as a heatmap for all five models? I am not sure how I can plot the weight matrix as a barplot like what I guessed you meant above.

Also, how can I combine the weights of all layers? I am not sure what should I do to combine the weight of all layers as they have different matrix sizes. DId you mean plotting the weight all layers together?

bilalsal commented 2 years ago

Hi @ivicts , apologies for responding late.

I think it is OK to normalize the heatmap "as a unit vector", as long as you take this into consideration when looking at the plots. You can also use a uniform scaling factor as a new way to look into the data.

Regarding the weights, you can stack the weights of a model together into a giant 1D vector, for the purpose of comparing the models (by plotting five giant 1D vectors side by side). There are other ways to plot the weights for sure: Go with what yo are comfortable with and is helpful for your analysis.