sum(attributions) not equal to F(input)-F(baseline) in DeepLIFT?

marcoancona / DeepExplain

A unified framework of perturbation and gradient-based attribution methods for Deep Neural Networks interpretability. DeepExplain also includes support for Shapley Values sampling. (ICLR 2018)

MIT License

720 stars 133 forks source link

demo code is:

with DeepExplain(session=sess) as de:
    explainer = de.get_explainer('deeplift', T=model[1], X=model[0],
                                 baseline=df_train_feature.values[0].astype(float))
    attributions = explainer.run(df_test_feature.values.astype(float))

preds = sess.run(model[1], feed_dict={model[0]: df_test_feature.values.astype(float)})
baseline_pred = sess.run(model[1], feed_dict={model[0]: np.expand_dims(df_train_feature.values[0].astype(float), axis=0)})
print('baseline_pred:{}'.format(baseline_pred))

diff = np.squeeze(preds) - attributions.sum(1)
plt.plot(diff)
plt.title('{}'.format('pred_MINUS_sum_of_attr'))
plt.show()

I expected the 'diff' is near baseline_pred(3.473), but from the above plot, It centered at 0 and has large variance.

I'm very confused, can anybody give me some explanations? thanks a million

Hello @zhangxiang390, I came across this issue by chance while googling. The discrepancy may be due to the model architecture used; the DeepExplain implementation of DeepLIFT only satisfies the conservation property for certain types of operations (specifically, the ones listed under SUPPORTED_ACTIVATIONS: https://github.com/marcoancona/DeepExplain/blob/16fb0f298bc676318fe535c334cb429bbc3eefa0/deepexplain/tensorflow/methods.py#L15-L17); if your architecture contains unsupported activations, DeepExplain will just use regular gradients for the backpropagation rather than DeepLIFT-style multipliers (and is thus not guaranteed to satisfy conservation).

For what it's worth, the DeepSHAP implementation of DeepLIFT supports a wider variety of activations, including elementwise multiplications that are used as gating units in RNNs: https://github.com/slundberg/shap#deep-learning-example-with-deepexplainer-tensorflowkeras-models

marcoancona / DeepExplain

sum(attributions) not equal to F(input)-F(baseline) in DeepLIFT? #57