showlab / Tune-A-Video

[ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
https://tuneavideo.github.io
Apache License 2.0
4.15k stars 377 forks source link

Lora weights #66

Closed rakesh-reddy95 closed 1 year ago

rakesh-reddy95 commented 1 year ago

Can I use LoRA weights instead of complete Unet to load and finetune.

zhangjiewu commented 1 year ago

yes, you can use lora weights.

rakesh-reddy95 commented 1 year ago

May i know how I can do that?

rakesh-reddy95 commented 1 year ago

@zhangjiewu I see that the LoRA weights can be used with the pre-trained SD with UNet2DConditionLoadersMixin but I don't see the support for Unet3D. Tune-A-Video loads the pre-trained SD and adds the temporal attention. What if I have LoRA weights and want to tune-a-video with using those weights, I have to load the attn_procs with Unet3D which is missing.

zhangjiewu commented 1 year ago

you can first load the lora weights to the pretrained SD, and then use it for tune-a-video. you can also upgrade the attention file to the latest one in diffusers, which supports lora weights loading with attention processor.

rakesh-reddy95 commented 1 year ago

@zhangjiewu I have loaded the lora weights but issue is while doing inference. I see that there will be a mismatch in matrices for cross attention as the lora will be trained on 2D images.

rakesh-reddy95 commented 1 year ago

Issue is resolved.