if module.key == "model.embed_tokens" and loras is not None and loras[0].embed_tokens is not None:
x = loras[0].embed_tokens(x)
elif module.key == "lm_head" and loras is not None and loras[0].lm_head is not None:
x = loras[0].lm_head(x)
else:
x = module.forward(x, cache = cache, attn_params = attn_params, past_len = past_len, loras = loras)
But, the generating result is not what I expected. I also try it to first version of exllama and it works fine (maybe because first version of exllama did not quantize lm_head?).
I'm trying to implement lora that has extra module inside the adapter. Here is what I did to exllamav2 script
Inside
lora.py
from line 69 I addAnd inside model.py line 615 I change into
But, the generating result is not what I expected. I also try it to first version of exllama and it works fine (maybe because first version of exllama did not quantize
lm_head
?).