Closed RonanKMcGovern closed 8 months ago
Running into the same issue. https://github.com/nikhil-ghosh-berkeley/loraplus/blob/main/glue/src/run_glue.py seems to suggest that we need to use their custom "arguments.py" for this to work.
https://github.com/nikhil-ghosh-berkeley/loraplus/blob/main/glue/src/arguments.py#L214 defines the ratio argument.
Actually getting all the imports to work in Google Colab is extremely nontrivial. For one thing, arguments
imports data_utils
. Not from . import data_utils
or something, just import_data_utils
. So unless you are in the same folder, it doesn't work.
I made a fork https://github.com/cleong110/loraplus, with files rearranged so I can import the new TrainingArguments and such.
This Colab notebook uses it, it seems to run. Poor results though
https://colab.research.google.com/drive/1V4Spi3iwY4h8yMQ_bjGPWWIzwOukrSdk?usp=sharing
Results with LoRA+ Results with ordinary LoRA
Tried using the create_loraplus_optimizer function, got similarly poor results
Tried again with a clean Colab notebook, still getting poor results.
Thanks @RonanKMcGovern and @cleong110 for the comments! We have updated the repo to clarify these issues.
To address @RonanKMcGovern's comment, the issue is that TrainingArguments needs to be overridden with the custom args (as in arguments.py) to be passed in to the custom LoraPlusTrainer. As @cleong110 did in his answer, we now have this custom args in loraplus.py as LoraPlusTrainingArguments, which is what you should pass in to LoraPlusTrainer. Let us know if you still are having any issues.
To address @cleong110's comments. The issue here is that the choice of loraplus_lr_ratio is model + task dependent. Generally a large ratio is helpful only if the task is complicated for the model and it needs to significantly update its features. In the example you gave, the task is very easy for the model, base LoRA is getting around 95% accuracy after just a few epochs. Therefore
Please see the paper for more about this issue and feel free to ask any questions. I put a copy of the colab notebook in the repo as well.
Many thanks.
Btw what do you consider a complex task?
I've tried function calling and chat fine tuning and both seem "simple" in the sense that optimizations don't change much.
Roughly speaking, I would consider tasks where the performance does not quickly shoot up to its maximum to be more "complex". Intuitively, on such tasks the model cannot just make minor adjustments to its features do well.
In our experiments we found that for GPT-2 / RoBERTa, the MNLI, QQP, and QNLI datasets were more "complex", but the SST-2 task was easier (94% accuracy possible achievable very quickly). The optimal loraplus_lr_ratio was larger for the former.
This is somewhat in line with findings in another paper A Kernel-Based View of Language Model Fine-Tuning which shows that for tasks like "TREC, MNLI, SNLI, QNLI, and MPQA" model behavior cannot be captured by a kernel, meaning the network features must evolve significantly to achieve good performance.
Appreciate the responses and discussion! Thanks for the patient explanations, I think I understand better now.
Also, nice work updating the repo so quickly to clarify things!
A bit off-topic, maybe should be a separate issue, but do you think it would be useful to have a Colab Notebook that demonstrates one of the tasks where large loraplus_lr_ratio is clearly helpful, and if so is there an example you know of that we could modify? I mostly just went through the PEFT repo's example notebooks trying to find one I could get running, then modified it to use LoraPlus. But if you know of a good MNLI notebook for example that we could use as a starting point, I'd be interested to give it a try
Of course, glad I could help clarify! It was useful for "finetuning" the repo :)
Regarding the notebook idea:
There are of course the scripts in the glue/
folder. These can be used to finetune on any GLUE task including MNLI. For example, run_gpt2_lora.sh
or run_roberta_lora.sh
should be pretty easy to run.
If you still would like something in notebook form, there is this notebook in the PEFT repo which does finetuning on the MRPC task. But there would need to be some modifications for MNLI or other GLUE tasks, (like in our glue/src code). If you want to discuss more, you can make a separate issue.
Issue:
I am trying to use LoRA+ with a HuggingFace trainer.
Replication
and this leads to the error.
Separately, I suppose I need to pass "lr" as well, rather than "learning_rate"?