pytorch / torchtune

PyTorch native finetuning library
https://pytorch.org/torchtune/main/
BSD 3-Clause "New" or "Revised" License
4.39k stars 448 forks source link

[FR] (Q)DoRA #893

Open DreamGenX opened 7 months ago

DreamGenX commented 7 months ago

(Q)DoRA, an alternative to (Q)LoRA is quickly proving to be a superior technique in terms of closing the gap between FFT and PEFT.

Known existing implementations:

Field reports / comparisons to LoRA:

rohan-varma commented 7 months ago

Thanks for filing this issue! The torchtune team has definitely been following some of the progress around (Q)DoRA and are super interested in adding it. Stay tuned for more thoughts and discussion soon!

RdoubleA commented 7 months ago

@DreamGenX If this is something you're interested in contributing to the library, we'd be happy to work with you on it :)

Prakyathkantharaju commented 7 months ago

@RdoubleA I added a Dora-based update to my fork. If you approve, I can submit a pull request. Link to the Dora update line here: https://github.com/Prakyathkantharaju/torchtune/blob/aefb8cbb02712177d690ca65cbac480fcb8ac429/torchtune/modules/peft/lora.py#L137

ebsmothers commented 7 months ago

Hi @Prakyathkantharaju, thanks for sharing the fork. If I understand correctly, you are returning Wx + (m * BAx / ||BAx||), is that right? (Here W is the original weight matrix, x is the input, A and B the usual LoRA matrices, and m the magnitude vector.) If I understand correctly from eq (5) of the DoRA paper, don't we want to instead return m * [(W + BA)/ ||W + BA||]x? However, please let me know if I'm misunderstanding the weight update here.

Prakyathkantharaju commented 7 months ago

Hello @ebsmothers ,

Yes, that is correct. The method presented in the paper and that I wrote is slightly different, I based my code on this repo: https://github.com/rasbt/dora-from-scratch/blob/main/Using-LinearDoRA.ipynb

I updated the code based on the author's recommendation here, which apparently make this inference faster on GPU.

Here is the reference for the author's comment: https://github.com/huggingface/peft/pull/1474#issuecomment-1963402710

I have added references in the code as well.

I also added a pull request https://github.com/pytorch/torchtune/pull/936 So that I can track any changes much clearer.

ebsmothers commented 6 months ago

Thanks for the clarification @Prakyathkantharaju, and thanks for opening the PR. I agree, let's consolidate the discussion on the PR. I left a few initial comments, let's continue the conversation there.

RdoubleA commented 3 months ago

Currently being worked on in #1115