turboderp / exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
MIT License
2.66k stars 214 forks source link

LoRA appears to not be used after the first run #278

Closed technillogue closed 10 months ago

technillogue commented 10 months ago

We get different results with peft vs exllama lora -- with exllama the finetuning tasks don't seem to be respected. Is there anything that needs to be done differently from example_lora.py for subsequent predictions to still use the lora?

technillogue commented 10 months ago

turns out we had been setting .lora = lora on the wrong object