Open Kenneth-Wong opened 6 months ago
Sorry for the late reply, I was busy on another project. This is just a simple implementation for injecting one spatial LoRA for each video, and the multiple spatial LoRAs are in parallel. Since in our customization setting, there are a small number of reference videos, this implementation will not cause problems. Of course, you can implement a more efficient injection method to deal with a large number of videos.
I found an interesting implementation in your codes: In Line 644-674 of MotionDirector_train.py, the spatial_lora is added for each video, it will lead to a result that the Linear layer of the LoraInjectedLayer will be recursively transformed into a LoraInjectedLayer. It will lead to a process like (if two videos are used for training):
$$ Linear_2[Linear_1(x) + (l_1^u l_1^d(x))] + l_2^u l_2^d [(Linear_1(x) + (l_1^u l_1^d(x)))] ... $$
What is the motivation? Can I only inject only one time? If the number of videos are 100, 1000, ..., won't it cause some problems?
Thanks.