showlab / MotionDirector

[ECCV 2024 Oral] MotionDirector: Motion Customization of Text-to-Video Diffusion Models.
https://showlab.github.io/MotionDirector/
Apache License 2.0
793 stars 46 forks source link

About the training parameters of Spatial Lora: Recursive ones? #38

Open Kenneth-Wong opened 3 months ago

Kenneth-Wong commented 3 months ago

I found an interesting implementation in your codes: In Line 644-674 of MotionDirector_train.py, the spatial_lora is added for each video, it will lead to a result that the Linear layer of the LoraInjectedLayer will be recursively transformed into a LoraInjectedLayer. It will lead to a process like (if two videos are used for training):

$$ Linear_2[Linear_1(x) + (l_1^u l_1^d(x))] + l_2^u l_2^d [(Linear_1(x) + (l_1^u l_1^d(x)))] ... $$

What is the motivation? Can I only inject only one time? If the number of videos are 100, 1000, ..., won't it cause some problems?

Thanks.

ruizhaocv commented 4 weeks ago

Sorry for the late reply, I was busy on another project. This is just a simple implementation for injecting one spatial LoRA for each video, and the multiple spatial LoRAs are in parallel. Since in our customization setting, there are a small number of reference videos, this implementation will not cause problems. Of course, you can implement a more efficient injection method to deal with a large number of videos.