huggingface / optimum-neuron

Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.
Apache License 2.0
184 stars 57 forks source link

Add support for LoRA based models #110

Open mmcclean-aws opened 1 year ago

mmcclean-aws commented 1 year ago

As there is more demand for LoRA based fine-tuning of models, we would like to have support for it in Optimum Neuron to optimize the user experience. We need to make sure that it can effectively use the cache to be able to download a cached NEFF and weights of the main model and use LoRA to fine-tune the LoRA model that is much smaller on Neuron. The compilation step should only need to compile the LoRA model which should be fast given that it is only a fraction of the weights of the main model.

In speaking with the Neuron engineering team we just need to make sure there are xm.mark_step() calls in the right place in the Optimum Neuron lib.

We also need to ensure the inference flow is also optimized. We need to ensure that we can avoid a full trace of the model by being able to download the compiled NEFF from HF Hub and merging the weights between the frozen model and LoRA model locally then merging before loading onto the XLA device. There will soon be the possibility to decouple the weights from the NEFF for the trace flow which should make this feasible.

MrA0505 commented 8 months ago

can anyone help me with "how to compile fine-tune LoRA adaption on inferentia and how to use this" i am working on stable diffusion. thank you support

HuggingFaceDocBuilderDev commented 4 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!

HuggingFaceDocBuilderDev commented 3 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!

yahavb commented 3 months ago

Same here - I wish to use LoRA for diffusion processes. Currently we fuse them into UNet before inference, but in the future we want to switch them on-the-fly depending on required image style.

HuggingFaceDocBuilderDev commented 2 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!

yahavb commented 2 months ago

I wish to use Control-LoRAs to fine-tune ControlNet. Optimum-neuron is going to make my life simpler!

yahavb commented 2 months ago

Referencing https://github.com/huggingface/optimum-neuron/issues/575

HuggingFaceDocBuilderDev commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!

njbrake commented 1 month ago

Would love to see support and documentation for LoRA finetuning in the optimum-neuron package.

HuggingFaceDocBuilderDev commented 1 week ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!