marcoancona / DeepExplain

A unified framework of perturbation and gradient-based attribution methods for Deep Neural Networks interpretability. DeepExplain also includes support for Shapley Values sampling. (ICLR 2018)
https://arxiv.org/abs/1711.06104
MIT License
720 stars 133 forks source link

Implementation of deepExplain for muilti-label classifier in keras #44

Open amjass12 opened 4 years ago

amjass12 commented 4 years ago

Hi all,

I was wondering if it would be possible to explain some scores from deepExplain for an occlusion analysis i am attempting to carry out. It is actually a little confusing as I am unsure about whether positive scores mean the feature contributes positively to the model or if the scores mean something entirely different.

Added to this confusion is the fact that this is a multi-label classification, so one sample may have more than one label. the code is as follows:

from deepexplain.tensorflow import DeepExplain
from keras import Model
with DeepExplain(session=K.get_session()) as de:
    input_tensors = model.inputs
    fModel = Model(inputs=input_tensors, outputs=model.outputs)
    target_tensor = fModel(input_tensors)

    input_tensor = model.layers[0].input

    fModel = Model(inputs=input_tensor, outputs=model.layers[-1].output) **[final layer 24 classes]**
    target_tensor = fModel(input_tensor)
    xs = X_train[0:24]... **this is confusing as X_train contains many samples however xs must match ys?**
    ys = y_train[0:24]... **y_train, number of classes**

    attributions = de.explain('occlusion', target_tensor, input_tensor, xs, ys=ys)
    print("Attributions:\n", attributions)

    attributions=attributions.transpose()
    attributions=pd.DataFrame(attributions, index=X_train1.index)

The scores i get range between -ve and positive values. Would somebody be kind enough to correct my code as I am pretty sure it is wrong especially as i am confused about the xs and ys as well as explain the output values, what -ve values mean and what +ve values mean, and what sort of range I should expect these to be in?

Many thanks!

marcoancona commented 4 years ago

Hi, positive/negative scores mean positive or negative contributions to the target output, respectively. Your code seems generally correct to me, but I would need to know what is your xs and ys to be more precise. You also mention -ve and +ve but what are these?

amjass12 commented 4 years ago

Hi @marcoancona

Thank you for your response!

Yes that indeed makes sense! I guess the confusing thing for me is that that was my assumption too, however for a target output for a class, positive samples do not necessarily reflect this, and if i look at the features, they are very variable and do not seem to be representative of a given class which is why i am unsure about why they are given positive scores. If i do shap importance values, I get features that show a unique importance to some classes and indeed the raw values of the feature for a given class reflect this.

This is why i thought my xs and ys is wrong. xs is is my X_train (training samples) array that contain x amount of samples, by x amount of features to be trained on. I think therefore my subset of 1:24 is wrong as that is the number of labels in my data. ys is my one-hot encoded array. This is 24 classes (24 labels) which specifies for each sample, what they are (0,1,0 etc)

-ve and +ve was just a question about the output of deep-explain. what do negative and positive values mean , however you answered this!! thanks!