pytorch / captum

Model interpretability and understanding for PyTorch
https://captum.ai
BSD 3-Clause "New" or "Revised" License
4.92k stars 497 forks source link

Only last few tokens are attributed when explaining GRU #1078

Open itsmemala opened 1 year ago

itsmemala commented 1 year ago

I'm working with a 1-layer GRU for text classification that takes BERT embeddings at the input. Each input sequence is of the shape (sequence length, bert-embedding-dimension). I'm looking for word level attribution scores for each sequence's prediction. Currently with the captum integrated gradients and occlusion explainers, I get attribution scores that are almost always the last few tokens of each sequence. This seems like it's stemming from the directional processing of GRU - any thoughts? Or perhaps do I need a more careful selection of the baseline? Or could it be an implementation error?

aobo-y commented 1 year ago

Could you elaborate more on this

I get attribution scores that are almost always the last few tokens of each sequence

Do you mean you can get the attribution scores of all tokens but always the last few tokens have pos/neg scores while others are zero? What is the baseline you are using?

itsmemala commented 1 year ago

Hi,

Yes, I get attribution scores of all tokens but the scores for the tokens are in ascending order: first token has a score of the order of e-27 (at each embedding dimension) and it slowly increases with the last few tokens having a score of the order of e-3 or e-2 (again at each embedding dimension). This trend is the same for all inputs. I do not specify any baseline so I believe the captum default of an all-0 input is being used. Seems like this is stemming from the multiplicative nature of GRU?