I was studying ExllamaV2 Linear and it seems to be similar to what i see in AutoGPTQ's adaptation - it is unclear if there is a backward function for the CUDA-optimized matrix multiplication operation. I suspect this will affect the finetuning performance of the model.
I'm wondering if this issue has been observed by anyone before? if so, are there any plans to fix it?
There is no backwards function, no. It shouldn't be too hard to wrap the reconstruction matmul with a torch.autograd.Function as done in alpaca_lora_4bit here, for instance.
Hi,
Are there any tests/benchmarks for GPTQ-LoRA finetuning with the exllamav2 kernels?
I have been using AutoGPTQ's adaptation of exllamav2 and noticed an issue of its training with adapters.
I was studying ExllamaV2 Linear and it seems to be similar to what i see in AutoGPTQ's adaptation - it is unclear if there is a backward function for the CUDA-optimized matrix multiplication operation. I suspect this will affect the finetuning performance of the model.
I'm wondering if this issue has been observed by anyone before? if so, are there any plans to fix it?