An Issue with Finetuning of GPTQ-LoRA with ExllamaV2 MatMul Kernel

Hi,

Are there any tests/benchmarks for GPTQ-LoRA finetuning with the exllamav2 kernels?

I have been using AutoGPTQ's adaptation of exllamav2 and noticed an issue of its training with adapters.

I was studying ExllamaV2 Linear and it seems to be similar to what i see in AutoGPTQ's adaptation - it is unclear if there is a backward function for the CUDA-optimized matrix multiplication operation. I suspect this will affect the finetuning performance of the model.

I'm wondering if this issue has been observed by anyone before? if so, are there any plans to fix it?

turboderp / exllamav2

An Issue with Finetuning of GPTQ-LoRA with ExllamaV2 MatMul Kernel #409