Open jumelet opened 2 years ago
This is odd! I didn't have this problem running on a transformer with longer sentences and far fewer interpolation points (like 500 or so). Could you share some of the code you used to generate this plot? When you generate interaction values for NLP models, you have to use an explainer tailored towards handling embedding-based inputs and be careful about where sums are taking place.
To be clear: convergence is theoretically guaranteed for IH values in the same way it is guaranteed for IG values.
For example: could you try comparing your code to https://github.com/suinleelab/path_explain/blob/master/examples/natural_language/transformers/check_completeness_embeddings.ipynb
Hi!
I've been working with your IH/IG implementation lately, and doing some experiments with it in an NLP context. What I have noticed is that when I increase the length of my input this is an adverse effect on the convergence of my IH interactions, with respect to the attributions I'm getting with IG.
IG itself converges nicely with respect to the completeness axiom and the model output, but the interaction completeness axiom of section 2.2.1 of your paper does not seem to hold at all in these cases
In this plot you can see that as the input length is increased, the Mean Squared Error between the interactions (summed over the last dimension) and the attributions no longer converges to a reasonable margin of error, with the number of interpolation points for IH on the x-axis (note the log scale on the y):
I tested this on a 1-layer LSTM (very tiny, only 16 hidden units), using the Tensorflow implementation of IH+IG, with a fixed zero-valued baseline (so not using the expectation).
What I was wondering is whether you encountered similar issues when testing your approach on larger models. I see that in Theorem 1 of the paper you touch upon related issues, but that only seems to concern the simply feedforward layer case, and not more complex models like LSTMs.