Open jaredquekjz opened 11 months ago
Hi,
There's been serious issues, still unresolved at this point, with merging qLora adapters easily and accurately without losing perplexity and finetuning quality. See https://github.com/huggingface/transformers/issues/26492 for example.
Think your technique can be really useful in improving the usefulness of 4-bit lora tuning. Would you consider contributing this technique to the axolotl library, which already has GPTQ training with flash attention, and is popular among OSS model trainers:
https://github.com/OpenAccess-AI-Collective/axolotl
As far as I know, the library does not permit merging GPTQ adapters. If your technique goes in - it will be really helpful to the community!!
Thanks for your suggestion. It is quite interesting. I am not familiar with the axolotl repo. But I may merge with it in the future.
Hi,
There's been serious issues, still unresolved at this point, with merging qLora adapters easily and accurately without losing perplexity and finetuning quality. See https://github.com/huggingface/transformers/issues/26492 for example.
Think your technique can be really useful in improving the usefulness of 4-bit lora tuning. Would you consider contributing this technique to the axolotl library, which already has GPTQ training with flash attention, and is popular among OSS model trainers:
https://github.com/OpenAccess-AI-Collective/axolotl
As far as I know, the library does not permit merging GPTQ adapters. If your technique goes in - it will be really helpful to the community!!