Open mmcclean-aws opened 1 year ago
can anyone help me with "how to compile fine-tune LoRA adaption on inferentia and how to use this" i am working on stable diffusion. thank you support
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!
Same here - I wish to use LoRA for diffusion processes. Currently we fuse them into UNet before inference, but in the future we want to switch them on-the-fly depending on required image style.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!
I wish to use Control-LoRAs to fine-tune ControlNet. Optimum-neuron is going to make my life simpler!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!
Would love to see support and documentation for LoRA finetuning in the optimum-neuron
package.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!
As there is more demand for LoRA based fine-tuning of models, we would like to have support for it in Optimum Neuron to optimize the user experience. We need to make sure that it can effectively use the cache to be able to download a cached NEFF and weights of the main model and use LoRA to fine-tune the LoRA model that is much smaller on Neuron. The compilation step should only need to compile the LoRA model which should be fast given that it is only a fraction of the weights of the main model.
In speaking with the Neuron engineering team we just need to make sure there are
xm.mark_step()
calls in the right place in the Optimum Neuron lib.We also need to ensure the inference flow is also optimized. We need to ensure that we can avoid a full trace of the model by being able to download the compiled NEFF from HF Hub and merging the weights between the frozen model and LoRA model locally then merging before loading onto the XLA device. There will soon be the possibility to decouple the weights from the NEFF for the trace flow which should make this feasible.