nikhil-ghosh-berkeley / loraplus

MIT License
198 stars 16 forks source link

TrainingArguments.__init__() got an unexpected keyword argument 'loraplus_lr_ratio' #3

Closed RonanKMcGovern closed 8 months ago

RonanKMcGovern commented 8 months ago

Issue:

I am trying to use LoRA+ with a HuggingFace trainer.

Replication

  1. I have copied all of the code in loraplus.py so that LoraPlusTrainer is defined.
  2. I then initiate a trainer as follows:
    # trainer = CustomTrainer(
    trainer = LoraPlusTrainer( #for LoRA+  
    model=model,
    train_dataset=train_dataset,
    eval_dataset=validation_dataset,
    args=transformers.TrainingArguments(
        # max_steps=1,
        num_train_epochs=1,
        per_device_train_batch_size=1,
        per_device_eval_batch_size=1,
        gradient_accumulation_steps=1,
        evaluation_strategy="steps",
        max_grad_norm=1,
        warmup_ratio=0.1,
        eval_steps=0.2,
        bf16=True,
        logging_steps=1,
        output_dir="outputs",
        optim="adamw_torch", #for training in full fp16/bf16 precision
        learning_rate=1e-4, 
        lr_scheduler_type='constant',
        hub_private_repo=True,
        loraplus_lr_ratio=20,  # Example value, adjust as necessary
    ),
    data_collator=data_collator,
    )

    and this leads to the error.

Separately, I suppose I need to pass "lr" as well, rather than "learning_rate"?

cleong110 commented 8 months ago

Running into the same issue. https://github.com/nikhil-ghosh-berkeley/loraplus/blob/main/glue/src/run_glue.py seems to suggest that we need to use their custom "arguments.py" for this to work.

cleong110 commented 8 months ago

https://github.com/nikhil-ghosh-berkeley/loraplus/blob/main/glue/src/arguments.py#L214 defines the ratio argument.

cleong110 commented 8 months ago

Actually getting all the imports to work in Google Colab is extremely nontrivial. For one thing, arguments imports data_utils. Not from . import data_utils or something, just import_data_utils. So unless you are in the same folder, it doesn't work.

cleong110 commented 8 months ago

I made a fork https://github.com/cleong110/loraplus, with files rearranged so I can import the new TrainingArguments and such.

This Colab notebook uses it, it seems to run. Poor results though

https://colab.research.google.com/drive/1V4Spi3iwY4h8yMQ_bjGPWWIzwOukrSdk?usp=sharing

cleong110 commented 8 months ago

Results with LoRA+ image Results with ordinary LoRA Untitled

cleong110 commented 8 months ago

Tried using the create_loraplus_optimizer function, got similarly poor results

image image

cleong110 commented 8 months ago

Tried again with a clean Colab notebook, still getting poor results.

image

nikhil-ghosh-berkeley commented 8 months ago

Thanks @RonanKMcGovern and @cleong110 for the comments! We have updated the repo to clarify these issues.

To address @RonanKMcGovern's comment, the issue is that TrainingArguments needs to be overridden with the custom args (as in arguments.py) to be passed in to the custom LoraPlusTrainer. As @cleong110 did in his answer, we now have this custom args in loraplus.py as LoraPlusTrainingArguments, which is what you should pass in to LoraPlusTrainer. Let us know if you still are having any issues.

To address @cleong110's comments. The issue here is that the choice of loraplus_lr_ratio is model + task dependent. Generally a large ratio is helpful only if the task is complicated for the model and it needs to significantly update its features. In the example you gave, the task is very easy for the model, base LoRA is getting around 95% accuracy after just a few epochs. Therefore

  1. We don't expect the optimal ratio to be much greater than 1
  2. There really is not much statistically significant performance to be gained in this task anyways.

Please see the paper for more about this issue and feel free to ask any questions. I put a copy of the colab notebook in the repo as well.

RonanKMcGovern commented 8 months ago

Many thanks.

Btw what do you consider a complex task?

I've tried function calling and chat fine tuning and both seem "simple" in the sense that optimizations don't change much.

nikhil-ghosh-berkeley commented 8 months ago

Roughly speaking, I would consider tasks where the performance does not quickly shoot up to its maximum to be more "complex". Intuitively, on such tasks the model cannot just make minor adjustments to its features do well.

In our experiments we found that for GPT-2 / RoBERTa, the MNLI, QQP, and QNLI datasets were more "complex", but the SST-2 task was easier (94% accuracy possible achievable very quickly). The optimal loraplus_lr_ratio was larger for the former.

This is somewhat in line with findings in another paper A Kernel-Based View of Language Model Fine-Tuning which shows that for tasks like "TREC, MNLI, SNLI, QNLI, and MPQA" model behavior cannot be captured by a kernel, meaning the network features must evolve significantly to achieve good performance.

cleong110 commented 8 months ago

Appreciate the responses and discussion! Thanks for the patient explanations, I think I understand better now.

Also, nice work updating the repo so quickly to clarify things!

A bit off-topic, maybe should be a separate issue, but do you think it would be useful to have a Colab Notebook that demonstrates one of the tasks where large loraplus_lr_ratio is clearly helpful, and if so is there an example you know of that we could modify? I mostly just went through the PEFT repo's example notebooks trying to find one I could get running, then modified it to use LoraPlus. But if you know of a good MNLI notebook for example that we could use as a starting point, I'd be interested to give it a try

nikhil-ghosh-berkeley commented 8 months ago

Of course, glad I could help clarify! It was useful for "finetuning" the repo :)

Regarding the notebook idea: There are of course the scripts in the glue/ folder. These can be used to finetune on any GLUE task including MNLI. For example, run_gpt2_lora.sh or run_roberta_lora.sh should be pretty easy to run.

If you still would like something in notebook form, there is this notebook in the PEFT repo which does finetuning on the MRPC task. But there would need to be some modifications for MNLI or other GLUE tasks, (like in our glue/src code). If you want to discuss more, you can make a separate issue.