lena-voita / the-story-of-heads

This is a repository with the code for the ACL 2019 paper "Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned" and the ACL 2021 paper "Analyzing Source and Target Contributions to NMT Predictions".
296 stars 46 forks source link

LRP Computation #9

Open huangxt39 opened 1 year ago

huangxt39 commented 1 year ago

Hi @lena-voita ,

Thanks for your great work. I'm confused about some pieces of code. In /lib/layers/lrp.py Line 86,90,94 when you do tf.reduce_sum(), you sum over axis=0. But as far as I understand, it should be axis=1. As indicated in the code, the variable "flat_impact" is of shape [output_size, combined_input_size], so to get the normalizer $z_j=\sumi z{ij}$, we should sum over the input dimension indexed by $i$. So I'm pretty confused, if I understand things wrongly, please don't hesitate to point it out. Thanks in advance!