Open itsmemala opened 1 year ago
Could you elaborate more on this
I get attribution scores that are almost always the last few tokens of each sequence
Do you mean you can get the attribution scores of all tokens but always the last few tokens have pos/neg scores while others are zero? What is the baseline you are using?
Hi,
Yes, I get attribution scores of all tokens but the scores for the tokens are in ascending order: first token has a score of the order of e-27 (at each embedding dimension) and it slowly increases with the last few tokens having a score of the order of e-3 or e-2 (again at each embedding dimension). This trend is the same for all inputs. I do not specify any baseline so I believe the captum default of an all-0 input is being used. Seems like this is stemming from the multiplicative nature of GRU?
I'm working with a 1-layer GRU for text classification that takes BERT embeddings at the input. Each input sequence is of the shape (sequence length, bert-embedding-dimension). I'm looking for word level attribution scores for each sequence's prediction. Currently with the captum integrated gradients and occlusion explainers, I get attribution scores that are almost always the last few tokens of each sequence. This seems like it's stemming from the directional processing of GRU - any thoughts? Or perhaps do I need a more careful selection of the baseline? Or could it be an implementation error?