Closed JinwooMcLee closed 3 years ago
Hi @JinwooMcLee,
apologies for the late response.
It might be less useful to compare attribution sums computed for different inputs. The attribution computed by Captum intends to give you an idea about how likely is an input feature to be part of the reason for the model to predict a specific output (e.g. the positive or the negative class).
The sum of the attributions computed for an input might be high if several input features could "vote" for to the positive class, even if the model ultimately predicts the negative class. If you try to compute the attribution with target=torch.zeros(size=_label.shape)
, you might have an idea which of the input features instead "vote" for the negative class. You can sum-up the attributions for these features and try to compare the sum with the sum you had under target=torch.ones(size=_label.shape)
for the exact same input. Captum is not actually designed to facilitate such comparison either, but it could be more informative than comparing attribution sums computed for different inputs.
Hope this helps
Thank you for detailed explanation, @bilalsal.
I'm closing this issue. Really appreciated!
Hi, I made multi-modal binary classification model. First modal is made of transformer encoder, and the other modal is just made of feed-forward network.
Model architecture code is as follows.
But as I calculate model's attribution with layer integrated gradients for each modality, sum of each modal's attributions doesn't match with model's prediction.
For example, model predicts 1 (Positive class) but sum of attribution equals 1.0056 against target being positive class. In other cases model predicts 0 (Negative class) but sum of attribution equals 1.0606, 1.2314, 1.1387, ... against target being positive class, which is higher than previous case.
Attribution calculation is as follows.
Am I interpreting the attribution in a wrong way? Or is attribution calculations is done wrong?
Any help would be really appreciated. Thanks!