Feature Request: Integrate Lora+/different learning rates for adapter matrices A and B

cleong110 commented 7 months ago

Feature request

LoRA+: Efficient Low Rank Adaptation of Large Models builds on LoRA " by setting different learning rates for the LoRA adapter matrices A and B with a well-chosen ratio", which they argue provides performance improvements, speedups, and no increase in computational cost.

Code is available at https://github.com/nikhil-ghosh-berkeley/loraplus.

Motivation

If it is true that using a ratio between the learning rates provides improvements at no cost, then having this as a new default could be broadly helpful.

Your contribution

Just wanted to point to https://github.com/nikhil-ghosh-berkeley/loraplus/blob/main/loraplus.py#L31 and https://github.com/nikhil-ghosh-berkeley/loraplus/blob/main/loraplus.py#L131, which seem to provide pretty much drop-in replacements for 🤗 Trainer.

They explain usage in the README also, at https://github.com/nikhil-ghosh-berkeley/loraplus?tab=readme-ov-file#usage, showing how to create a Trainer, or an Optimizer, and the new hyperparameters introduced.

BenjaminBossan commented 7 months ago

Hi @cleong110, did you have the opportunity to test this out and see if it works for your use cases?

In general, PEFT is agnostic towards the training process itself. I could still see create_loraplus_optimizer being added as a convenience function for users, though I wouldn't add the Trainer subclass. Note, however, that as a user, I can already just copy&paste the function and use it as I wish, so adding it to PEFT would add no benefit except for maybe saving a few seconds of work (and promoting the existence of this method).

I'll ping @nikhil-ghosh-berkeley in case they want to add something to the discussion.

cleong110 commented 7 months ago

I'd be happy to give it a test. I'm still new to PEFT. I saw this new Lora+, noticed that the old LoRA was now integrated into PEFT, and figured it would make sense to integrate LoRA+ as well, given how simple a change it seems to be.

Trouble being, I cannot seem to find a notebook in the examples folder which I can get to run all the way through without error. I tried the following, and ran into various issues

The one linked from the bottom of https://pytorch.org/blog/finetune-llms/ requires LLama-2 access, which I lack.
https://github.com/huggingface/peft/tree/main/examples/image_classification - first one has Repository Not Found for url: https://huggingface.co/api/models/vit-base-patch16-224-in21k-finetuned-lora-food101., and You don't have the rights to create a model under this namespace, which is confusing because why do I need such things to just train a model in Colab?

Gonna give the following two a try with and without LoraPlus

https://github.com/huggingface/peft/tree/main/examples/semantic_segmentation
PoolFormer model from timm notebook seems to run? I'll give it a shot.

cleong110 commented 7 months ago

Sematic Segmentation notebook also crashed with HuggingFace Hub errors:

---------------------------------------------------------------------------

HTTPError                                 Traceback (most recent call last)

[/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py](https://localhost:8080/#) in hf_raise_for_status(response, endpoint_name)
    285     try:
--> 286         response.raise_for_status()
    287     except HTTPError as e:

14 frames

HTTPError: 403 Client Error: Forbidden for url: https://huggingface.co/api/repos/create

The above exception was the direct cause of the following exception:

HfHubHTTPError                            Traceback (most recent call last)

HfHubHTTPError: 403 Client Error: Forbidden for url: https://huggingface.co/api/repos/create (Request ID: Root=1-65d8cd99-62347fb141b525fe4aefaf4e;fbb51710-aebf-4b04-ade7-d558b343f33c)

You don't have the rights to create a model under this namespace

During handling of the above exception, another exception occurred:

HTTPError                                 Traceback (most recent call last)

HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/api/models/mit-b0-scene-parse-150-lora

The above exception was the direct cause of the following exception:

RepositoryNotFoundError                   Traceback (most recent call last)

[/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py](https://localhost:8080/#) in hf_raise_for_status(response, endpoint_name)
    321                 " make sure you are authenticated."
    322             )
--> 323             raise RepositoryNotFoundError(message, response) from e
    324 
    325         elif response.status_code == 400:

RepositoryNotFoundError: 404 Client Error. (Request ID: Root=1-65d8cd9a-665e1a4642c1d9a57fa897cf;6a27c3d1-da8e-4591-990c-00bcbad42372)

Repository Not Found for url: https://huggingface.co/api/models/mit-b0-scene-parse-150-lora.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.

cleong110 commented 7 months ago

I am authenticated with a Read-access token to my HuggingFace account. I'm really not sure why these errors are happening as I have no desire to push any models or create any repos

cleong110 commented 7 months ago

Ah, just change push_to_hub to False. Of course

cleong110 commented 7 months ago

Getting NaNs and extremely low accuracies when training the semantic segmentation model. No LoraPlus involved, I guess I'll make a separate issue for that.

cleong110 commented 7 months ago

All right, finally managed to get LoraPlus running... and got significantly worse accuracy on image classification. -_-

Not sure if LoraPlus is bugged or what, but I think my original reasoning for wanting to integrate it into PEFT doesn't make sense. I thought it would (1) give better results for the same compute, and (2) be easy to integrate, but neither one seems true.

https://github.com/nikhil-ghosh-berkeley/loraplus/issues/3 for more details.

BenjaminBossan commented 7 months ago

I don't think a single example invalidates the whole approach, but let's see what comes out of the discussion on the lora+ repo before proceeding with PEFT.

Regarding the notebooks, thanks for investigating and creating an issue.

cleong110 commented 7 months ago

Agreed that it doesn't invalidate the approach. Mostly just thinking that it wasn't as "plug and play" as I originally thought, so my apologies for sort of "jumping the gun" based on incomplete understanding.

So yes, wait and see on the discussion, and see what comes of it seems wise. Apparently they've made a few updates in response!

nikhil-ghosh-berkeley commented 7 months ago

Hi @BenjaminBossan and @cleong110 thanks for initiating this discussion! I think it would be really great for LoRA+ to be integrated into PEFT to reduce friction for users and for improving visibility. Let me know if there is anything on my end I can do to help make this possible.

BenjaminBossan commented 7 months ago

Thanks @nikhil-ghosh-berkeley for your offer. There is a PR #1509 to add LoRA+ to PEFT, but it's not in a reviewable state yet. Maybe I can ping you there once it's ready or if I have questions.

One question: Design-wise, my idea would be not to add the LoraPlusTrainer, first because it feels out of scope for PEFT and second because it clashes if users want to use other Trainer subclasses (say SFTTrainer). Would it be enough to add create_loraplus_optimizer and let users pass the resulting optimizer to Trainer instances?

sayakpaul commented 7 months ago

If it is true that using a ratio between the learning rates provides improvements at no cost, then having this as a new default could be broadly helpful.

I'd say we don't default to that because that might be a breaking chance. Rather, we provide a way for the users to configure it themselves.

BenjaminBossan commented 7 months ago

If it is true that using a ratio between the learning rates provides improvements at no cost, then having this as a new default could be broadly helpful.

I'd say we don't default to that because that might be a breaking chance. Rather, we provide a way for the users to configure it themselves.

Since PEFT does not contain any training code (there are enough options out there), we cannot make this a default anyway :) We can provide the feature but it's up to the user to decide if they want to use it or not. That said, we can certainly give a recommendation in the docs on whether to use it by default or not.

nikhil-ghosh-berkeley commented 7 months ago

Thanks @nikhil-ghosh-berkeley for your offer. There is a PR #1509 to add LoRA+ to PEFT, but it's not in a reviewable state yet. Maybe I can ping you there once it's ready or if I have questions.

Sounds good, please feel free to ping.

One question: Design-wise, my idea would be not to add the LoraPlusTrainer, first because it feels out of scope for PEFT and second because it clashes if users want to use other Trainer subclasses (say SFTTrainer). Would it be enough to add create_loraplus_optimizer and let users pass the resulting optimizer to Trainer instances?

Yes, this makes sense. I think adding create_loraplus_optimizer would be a great option.

AngledLuffa commented 7 months ago

Since PEFT does not contain any training code (there are enough options out there), we cannot make this a default anyway :)

One simple interface change might be to have two parameters lists, alpha_parameters and beta_parameters, which could take the place of calling parameters() when building an optimizer

BenjaminBossan commented 7 months ago

One simple interface change might be to have two parameters lists, alpha_parameters and beta_parameters, which could take the place of calling parameters() when building an optimizer

Could you please elaborate?

AngledLuffa commented 7 months ago

My current attempt at LoRA+ experiments is to do this before creating an optimizer

            {'param_group_name': 'peft_A', 'params': [param for name, param in model.bert_model.named_parameters() if name.find("lora_A") >= 0], 'lr': bert_learning_rate},
            {'param_group_name': 'peft_B', 'params': [param for name, param in model.bert_model.named_parameters() if name.find("lora_B") >= 0], 'lr': bert_learning_rate * split_peft_beta},

I'm just thinking that a utility method on the peft-injected transformer such as alpha_parameters and beta_parameters would avoid needing to split those parameters into two lists ourselves

BenjaminBossan commented 7 months ago

I still don't get it. the create_loraplus_optimizer function is already taking care of creating different parameter groups for LoRA A and B, what are you missing? And what is alpha and beta referring to?

AngledLuffa commented 7 months ago

what are you missing?

I was looking for a way for users to create their own optimizers from the LoRA weights. Perhaps this is unnecessary if loraplus is eventually integrated into peft anyway. (Thanks for taking that on!)

And what is alpha and beta referring to?

err... that was a kludge on my side for not wanting single letter names in the code. it was meant to be A and B

BenjaminBossan commented 7 months ago

I was looking for a way for users to create their own optimizers from the LoRA weights. Perhaps this is unnecessary if loraplus is eventually integrated into peft anyway. (Thanks for taking that on!)

Well, LoRA+ would be a way, but of course there could be other ways of achieving this. If you have some more general function in mind, feel free to propose.

err... that was a kludge on my side for not wanting single letter names in the code. it was meant to be A and B

Okay, I was not sure, as alpha and beta can also be arguments for certain optimizers.

github-actions[bot] commented 6 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

BenjaminBossan commented 2 months ago

Implemented in #1915.

huggingface / peft