turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.19k stars 234 forks source link

An Issue with Finetuning of GPTQ-LoRA with ExllamaV2 MatMul Kernel #409

Open achew010 opened 2 months ago

achew010 commented 2 months ago

Hi,

Are there any tests/benchmarks for GPTQ-LoRA finetuning with the exllamav2 kernels?

I have been using AutoGPTQ's adaptation of exllamav2 and noticed an issue of its training with adapters.

I was studying ExllamaV2 Linear and it seems to be similar to what i see in AutoGPTQ's adaptation - it is unclear if there is a backward function for the CUDA-optimized matrix multiplication operation. I suspect this will affect the finetuning performance of the model.

I'm wondering if this issue has been observed by anyone before? if so, are there any plans to fix it?

turboderp commented 2 months ago

There is no backwards function, no. It shouldn't be too hard to wrap the reconstruction matmul with a torch.autograd.Function as done in alpaca_lora_4bit here, for instance.