Open DreamGenX opened 7 months ago
Thanks for filing this issue! The torchtune team has definitely been following some of the progress around (Q)DoRA and are super interested in adding it. Stay tuned for more thoughts and discussion soon!
@DreamGenX If this is something you're interested in contributing to the library, we'd be happy to work with you on it :)
@RdoubleA I added a Dora-based update to my fork. If you approve, I can submit a pull request. Link to the Dora update line here: https://github.com/Prakyathkantharaju/torchtune/blob/aefb8cbb02712177d690ca65cbac480fcb8ac429/torchtune/modules/peft/lora.py#L137
Hi @Prakyathkantharaju, thanks for sharing the fork. If I understand correctly, you are returning Wx + (m * BAx / ||BAx||)
, is that right? (Here W is the original weight matrix, x is the input, A and B the usual LoRA matrices, and m the magnitude vector.) If I understand correctly from eq (5) of the DoRA paper, don't we want to instead return m * [(W + BA)/ ||W + BA||]x
? However, please let me know if I'm misunderstanding the weight update here.
Hello @ebsmothers ,
Yes, that is correct. The method presented in the paper and that I wrote is slightly different, I based my code on this repo: https://github.com/rasbt/dora-from-scratch/blob/main/Using-LinearDoRA.ipynb
I updated the code based on the author's recommendation here, which apparently make this inference faster on GPU.
Here is the reference for the author's comment: https://github.com/huggingface/peft/pull/1474#issuecomment-1963402710
I have added references in the code as well.
I also added a pull request https://github.com/pytorch/torchtune/pull/936 So that I can track any changes much clearer.
Thanks for the clarification @Prakyathkantharaju, and thanks for opening the PR. I agree, let's consolidate the discussion on the PR. I left a few initial comments, let's continue the conversation there.
Currently being worked on in #1115
(Q)DoRA, an alternative to (Q)LoRA is quickly proving to be a superior technique in terms of closing the gap between FFT and PEFT.
Known existing implementations:
Field reports / comparisons to LoRA: