suinleelab / path_explain

A repository for explaining feature attributions and feature interactions in deep neural networks.
MIT License
186 stars 28 forks source link

No convergence for IH for larger input strings #15

Open jumelet opened 2 years ago

jumelet commented 2 years ago

Hi!

I've been working with your IH/IG implementation lately, and doing some experiments with it in an NLP context. What I have noticed is that when I increase the length of my input this is an adverse effect on the convergence of my IH interactions, with respect to the attributions I'm getting with IG.

IG itself converges nicely with respect to the completeness axiom and the model output, but the interaction completeness axiom of section 2.2.1 of your paper does not seem to hold at all in these cases

In this plot you can see that as the input length is increased, the Mean Squared Error between the interactions (summed over the last dimension) and the attributions no longer converges to a reasonable margin of error, with the number of interpolation points for IH on the x-axis (note the log scale on the y): Screenshot from 2021-10-29 16-31-17

I tested this on a 1-layer LSTM (very tiny, only 16 hidden units), using the Tensorflow implementation of IH+IG, with a fixed zero-valued baseline (so not using the expectation).

What I was wondering is whether you encountered similar issues when testing your approach on larger models. I see that in Theorem 1 of the paper you touch upon related issues, but that only seems to concern the simply feedforward layer case, and not more complex models like LSTMs.

psturmfels commented 2 years ago

This is odd! I didn't have this problem running on a transformer with longer sentences and far fewer interpolation points (like 500 or so). Could you share some of the code you used to generate this plot? When you generate interaction values for NLP models, you have to use an explainer tailored towards handling embedding-based inputs and be careful about where sums are taking place.

To be clear: convergence is theoretically guaranteed for IH values in the same way it is guaranteed for IG values.

psturmfels commented 2 years ago

For example: could you try comparing your code to https://github.com/suinleelab/path_explain/blob/master/examples/natural_language/transformers/check_completeness_embeddings.ipynb