open-mmlab / PIA

[CVPR 2024] PIA, your Personalized Image Animator. Animate your images by text prompt, combing with Dreambooth, achieving stunning videos. PIA,你的个性化图像动画生成器,利用文本提示将图像变为奇妙的动画
https://pi-animator.github.io/
Apache License 2.0
888 stars 70 forks source link

Why not train conv_in.bias? #34

Closed ryancll closed 5 months ago

ryancll commented 9 months ago

According to your paper and code, you only update the conv_in.weight and temporal layers. Is there any solid reason or ablation experiments to prove that keeping conv_in.bias frozen can achieve better performance?

ymzhang0319 commented 5 months ago

Hi @ryancll,

There might be some misunderstanding. The conv_in.bias is also trainable. You can print the weight in pretrained stablediffusino-v1-5 and PIA for comparison.

ernestchu commented 4 months ago

@ymzhang0319 When loading DreamBooth weights, you use the conv_in.bias from DreamBooth weights instead of the PIA weights, right?

https://github.com/open-mmlab/PIA/blob/152c90cee11218b32cf7f99526ed9888e228ebdf/animatediff/pipelines/i2v_pipeline.py#L216-L223


As for the difference in weights in pretrained stablediffusino-v1-5 and PIA, this can also be one of the reasons.

@Tianhao-Qi T, we introduced our training method in section 3.3. Following the training strategy of animatediff, we first train a domain adapter on webvid. As animatediff has not released the weights for their LoRA version of the domain adapter, we directly fine-tune the entire UNet, transforming it into a 'domain adapter' for webvid. Originally posted by @LeoXing1996 in https://github.com/open-mmlab/PIA/issues/32#issuecomment-1877016187