Closed aymeric-roucher closed 2 weeks ago
Hi @aymeric-roucher,
I think that the TinyLlama model is also not working correctly in quantized version.
This is due to the fact that BitsAndBytes
replaces all nn.Linear
layers with a Linear8bitLt
layer.
This means, that we must specify in the composite that we'd like to apply the EpsilonRule
on this layer by writing
Linear8bitLt: rules.EpsilonRule
as specified in https://github.com/rachtibat/LRP-eXplains-Transformers/blob/66221eddb7fb932e299f906a261feab8f1b9581e/lxt/models/mixtral.py#L1245
Unfortunately, in the lxt.models.llama.py
file, I manually replaced all nn.Linear
layers with lm.LinearEpsilon
like in https://github.com/rachtibat/LRP-eXplains-Transformers/blob/66221eddb7fb932e299f906a261feab8f1b9581e/lxt/models/llama.py#L245C26-L245C42
So, to make it work:
lm.LinearEpsilon
in the llama.py file with nn.Linear
Linear8bitLt: rules.EpsilonRule
to the attnlrp composite at line https://github.com/rachtibat/LRP-eXplains-Transformers/blob/66221eddb7fb932e299f906a261feab8f1b9581e/lxt/models/llama.py#L63nn.Linear: rules.EpsilonRule
to the attnlrp composite only to make sure that the unquantized version is also supported.Right now, I am on vacation, but when I am back in two weeks, I can push an updated version.
I hope it helps, Reduan
Hey @aymeric-roucher,
you can find now a quantized llama example at https://github.com/rachtibat/LRP-eXplains-Transformers/tree/main/examples for the new release https://github.com/rachtibat/LRP-eXplains-Transformers/releases/tag/v0.6.1.
Have fun with it!
This is great, thanks @rachtibat !
Do you have examples for working with a quantized llama3? I'm trying with
But then I get nans for the relevances, whereas the TinyLlama model given in the doc works just fine.