rachtibat / LRP-eXplains-Transformers

Layer-Wise Relevance Propagation for Large Language Models and Vision Transformers [ICML 2024]
https://lxt.readthedocs.io
Other
66 stars 7 forks source link

added BERT implementation #5

Closed pkhdipraja closed 2 months ago

dvdblk commented 2 months ago

Isn't it missing the use of LinearEpsilon instead of Linear ? Also in prune_linear_layer.

EDIT: nvm this is handled by the Composite :)

rachtibat commented 2 months ago

Hey @dvdblk, Thank you for your time and taking a look at the pull request of @pkhdipraja !

You're right that he could use the LinearEpsilon instead of nn.Linear. But there is also another way to do it: we could keep all nn.Linear and instead apply the lxt.rules.EpsilonRuleon all nn.Linear layers in the Composite. If you look at line 63 in bert.py, you see that @pkhdipraja did apply this rule on all nn.Linear.

At the end, it depends on your taste but both approaches are correct. Actually, I benchmarked once both approaches and saw that the way of @pkhdipraja is faster! I should mention this in the documentaion.

Best greetings!

rachtibat commented 2 months ago

@dvdblk nice, you already edited your comment! (:

rachtibat commented 2 months ago

Hey @pkhdipraja ,

Thank you so much for adding another model to this library! The code works as expected in my tests! I pull it into the main branch.

Best