Open vedal opened 2 years ago
hi @vedal , it depends on your need. Captum, as a tool itself, is not opinionated. It is feasible to use either way.
Generally, "with Softmax" is more reasonable for multi-class classification, if you want to explain the predicted probability. Because the probability is decided by the target logit under the context of all the other logits, the output should take all logits into account. However, this is eventually up to what you want to explain. You can still choose to attribute "without Softmax" if you want to understand how features impact that single logit, instead of the final probability.
Thank you pointing out the CIFAR10 tutorial. I agree that having the "Softmax" there makes more sense (cc @NarineK).
@aobo-y thanks alot for the thorough answer! It makes sense!
Would it make sense, given that many models dont have softmax at the end during training since they use nn.CrossEntropyLoss
, to have an argument softmax=False
for each attribution-method, for optioanlly passing logits through a softmax? Especially since softmax seems to be suggested by IG author (see previous links)
One could of course also append softmax to model before calling an attribution-method. Im just asking since the docstring example caused me some confusion as to what was the recommended method. Given that its optional, making a softmax arg would be a user-friendly way to show that there's no consensus on this
@vedal what you said makes sense. But as Captum is trying to be a generic library for all kinds for models & problems, such an "append_softmax" option will likely only be useful for multi-class classification problem. It can be confusing to other users, say a regression model, and cause unnecessary misuse. We also would like to be less opinionated about what is the "common practice", especially the model architecture.
As you noticed, it is actually very easy for users to create a wrapper themselves, e.g.,
def wrapper(*args, **kwargs):
return torch.nn.functional.softmax(
model(*args, **kwargs)
)
ig = IntegratedGradient(wrapper)
I agree our doc & tutorials are not perfect. We can keep this issue open until I update them to be less confusing.
Thanks alot for your suggested code sample (particularly the idea of using functional softlax to avoid recreating an nn.Sequential object) @aobo-y. I very much appreciate your humble and friendly approach, and thorough explanation. I'm sure others will find it useful as well
❓ Questions and Help
In most docstrings of captum attribution methods, there is an example where the model is said to return class probabilities:
However, the example net in captum tutorials such as this don't seem to have an
nn.Softmax
at the end. I also noted this issue, where both with and without softmax is suggested.Finally, I noted that only
captum.AttributionVisualizer
mentions of a score function like softmax.My question is: what is common practice: with or without softmax at the end of the network when computing attributions?
Edit: A summary paper section VI.C discussing this issue.