Closed cleong110 closed 5 months ago
Hi @cleong110, did you have the opportunity to test this out and see if it works for your use cases?
In general, PEFT is agnostic towards the training process itself. I could still see create_loraplus_optimizer
being added as a convenience function for users, though I wouldn't add the Trainer
subclass. Note, however, that as a user, I can already just copy&paste the function and use it as I wish, so adding it to PEFT would add no benefit except for maybe saving a few seconds of work (and promoting the existence of this method).
I'll ping @nikhil-ghosh-berkeley in case they want to add something to the discussion.
I'd be happy to give it a test. I'm still new to PEFT. I saw this new Lora+, noticed that the old LoRA was now integrated into PEFT, and figured it would make sense to integrate LoRA+ as well, given how simple a change it seems to be.
Trouble being, I cannot seem to find a notebook in the examples folder which I can get to run all the way through without error. I tried the following, and ran into various issues
Repository Not Found for url: https://huggingface.co/api/models/vit-base-patch16-224-in21k-finetuned-lora-food101.
, and You don't have the rights to create a model under this namespace
, which is confusing because why do I need such things to just train a model in Colab?Gonna give the following two a try with and without LoraPlus
Sematic Segmentation notebook also crashed with HuggingFace Hub errors:
---------------------------------------------------------------------------
HTTPError Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py](https://localhost:8080/#) in hf_raise_for_status(response, endpoint_name)
285 try:
--> 286 response.raise_for_status()
287 except HTTPError as e:
14 frames
HTTPError: 403 Client Error: Forbidden for url: https://huggingface.co/api/repos/create
The above exception was the direct cause of the following exception:
HfHubHTTPError Traceback (most recent call last)
HfHubHTTPError: 403 Client Error: Forbidden for url: https://huggingface.co/api/repos/create (Request ID: Root=1-65d8cd99-62347fb141b525fe4aefaf4e;fbb51710-aebf-4b04-ade7-d558b343f33c)
You don't have the rights to create a model under this namespace
During handling of the above exception, another exception occurred:
HTTPError Traceback (most recent call last)
HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/api/models/mit-b0-scene-parse-150-lora
The above exception was the direct cause of the following exception:
RepositoryNotFoundError Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py](https://localhost:8080/#) in hf_raise_for_status(response, endpoint_name)
321 " make sure you are authenticated."
322 )
--> 323 raise RepositoryNotFoundError(message, response) from e
324
325 elif response.status_code == 400:
RepositoryNotFoundError: 404 Client Error. (Request ID: Root=1-65d8cd9a-665e1a4642c1d9a57fa897cf;6a27c3d1-da8e-4591-990c-00bcbad42372)
Repository Not Found for url: https://huggingface.co/api/models/mit-b0-scene-parse-150-lora.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
I am authenticated with a Read-access token to my HuggingFace account. I'm really not sure why these errors are happening as I have no desire to push any models or create any repos
Ah, just change push_to_hub
to False. Of course
Getting NaNs and extremely low accuracies when training the semantic segmentation model. No LoraPlus involved, I guess I'll make a separate issue for that.
All right, finally managed to get LoraPlus running... and got significantly worse accuracy on image classification. -_-
Not sure if LoraPlus is bugged or what, but I think my original reasoning for wanting to integrate it into PEFT doesn't make sense. I thought it would (1) give better results for the same compute, and (2) be easy to integrate, but neither one seems true.
https://github.com/nikhil-ghosh-berkeley/loraplus/issues/3 for more details.
I don't think a single example invalidates the whole approach, but let's see what comes out of the discussion on the lora+ repo before proceeding with PEFT.
Regarding the notebooks, thanks for investigating and creating an issue.
Agreed that it doesn't invalidate the approach. Mostly just thinking that it wasn't as "plug and play" as I originally thought, so my apologies for sort of "jumping the gun" based on incomplete understanding.
So yes, wait and see on the discussion, and see what comes of it seems wise. Apparently they've made a few updates in response!
Hi @BenjaminBossan and @cleong110 thanks for initiating this discussion! I think it would be really great for LoRA+ to be integrated into PEFT to reduce friction for users and for improving visibility. Let me know if there is anything on my end I can do to help make this possible.
Thanks @nikhil-ghosh-berkeley for your offer. There is a PR #1509 to add LoRA+ to PEFT, but it's not in a reviewable state yet. Maybe I can ping you there once it's ready or if I have questions.
One question: Design-wise, my idea would be not to add the LoraPlusTrainer
, first because it feels out of scope for PEFT and second because it clashes if users want to use other Trainer
subclasses (say SFTTrainer
). Would it be enough to add create_loraplus_optimizer
and let users pass the resulting optimizer to Trainer
instances?
If it is true that using a ratio between the learning rates provides improvements at no cost, then having this as a new default could be broadly helpful.
I'd say we don't default to that because that might be a breaking chance. Rather, we provide a way for the users to configure it themselves.
If it is true that using a ratio between the learning rates provides improvements at no cost, then having this as a new default could be broadly helpful.
I'd say we don't default to that because that might be a breaking chance. Rather, we provide a way for the users to configure it themselves.
Since PEFT does not contain any training code (there are enough options out there), we cannot make this a default anyway :) We can provide the feature but it's up to the user to decide if they want to use it or not. That said, we can certainly give a recommendation in the docs on whether to use it by default or not.
Thanks @nikhil-ghosh-berkeley for your offer. There is a PR #1509 to add LoRA+ to PEFT, but it's not in a reviewable state yet. Maybe I can ping you there once it's ready or if I have questions.
Sounds good, please feel free to ping.
One question: Design-wise, my idea would be not to add the
LoraPlusTrainer
, first because it feels out of scope for PEFT and second because it clashes if users want to use otherTrainer
subclasses (saySFTTrainer
). Would it be enough to addcreate_loraplus_optimizer
and let users pass the resulting optimizer toTrainer
instances?
Yes, this makes sense. I think adding create_loraplus_optimizer
would be a great option.
Since PEFT does not contain any training code (there are enough options out there), we cannot make this a default anyway :)
One simple interface change might be to have two parameters
lists, alpha_parameters
and beta_parameters
, which could take the place of calling parameters()
when building an optimizer
One simple interface change might be to have two
parameters
lists,alpha_parameters
andbeta_parameters
, which could take the place of callingparameters()
when building an optimizer
Could you please elaborate?
My current attempt at LoRA+ experiments is to do this before creating an optimizer
{'param_group_name': 'peft_A', 'params': [param for name, param in model.bert_model.named_parameters() if name.find("lora_A") >= 0], 'lr': bert_learning_rate},
{'param_group_name': 'peft_B', 'params': [param for name, param in model.bert_model.named_parameters() if name.find("lora_B") >= 0], 'lr': bert_learning_rate * split_peft_beta},
I'm just thinking that a utility method on the peft-injected transformer such as alpha_parameters
and beta_parameters
would avoid needing to split those parameters into two lists ourselves
I still don't get it. the create_loraplus_optimizer function is already taking care of creating different parameter groups for LoRA A and B, what are you missing? And what is alpha
and beta
referring to?
what are you missing?
I was looking for a way for users to create their own optimizers from the LoRA weights. Perhaps this is unnecessary if loraplus is eventually integrated into peft anyway. (Thanks for taking that on!)
And what is alpha and beta referring to?
err... that was a kludge on my side for not wanting single letter names in the code. it was meant to be A and B
I was looking for a way for users to create their own optimizers from the LoRA weights. Perhaps this is unnecessary if loraplus is eventually integrated into peft anyway. (Thanks for taking that on!)
Well, LoRA+ would be a way, but of course there could be other ways of achieving this. If you have some more general function in mind, feel free to propose.
err... that was a kludge on my side for not wanting single letter names in the code. it was meant to be A and B
Okay, I was not sure, as alpha and beta can also be arguments for certain optimizers.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Implemented in #1915.
Feature request
LoRA+: Efficient Low Rank Adaptation of Large Models builds on LoRA " by setting different learning rates for the LoRA adapter matrices A and B with a well-chosen ratio", which they argue provides performance improvements, speedups, and no increase in computational cost.
Code is available at https://github.com/nikhil-ghosh-berkeley/loraplus.
Motivation
If it is true that using a ratio between the learning rates provides improvements at no cost, then having this as a new default could be broadly helpful.
Your contribution
Just wanted to point to https://github.com/nikhil-ghosh-berkeley/loraplus/blob/main/loraplus.py#L31 and https://github.com/nikhil-ghosh-berkeley/loraplus/blob/main/loraplus.py#L131, which seem to provide pretty much drop-in replacements for 🤗 Trainer.
They explain usage in the README also, at https://github.com/nikhil-ghosh-berkeley/loraplus?tab=readme-ov-file#usage, showing how to create a Trainer, or an Optimizer, and the new hyperparameters introduced.