rachtibat / LRP-eXplains-Transformers

Layer-Wise Relevance Propagation for Large Language Models and Vision Transformers [ICML 2024]
https://lxt.readthedocs.io
Other
100 stars 12 forks source link

enforced attention class to be 'eager' in order to enable usage of torch>2.1 #18

Open effingpaul opened 1 day ago

effingpaul commented 1 day ago

Before torch version > 2.1 would not work for the llama model. Possibly mixtral and hi3 could have been affected with newer pytorch version. By enforcing "eager" the KeyError is prevented.