open-mmlab / PIA

[CVPR 2024] PIA, your Personalized Image Animator. Animate your images by text prompt, combing with Dreambooth, achieving stunning videos. PIA,你的个性化图像动画生成器,利用文本提示将图像变为奇妙的动画
https://pi-animator.github.io/
Apache License 2.0
808 stars 67 forks source link

Training Details #19

Closed ryancll closed 2 months ago

ryancll commented 6 months ago

Thx for the amazing work!

I tried to reproduce the training process according to your paper. My results(paste below) are much blurrier than your demo and sometimes frames change suddenly. As I have limited gpu resources, I trained the model with resolution 256256, batch size 4 (inference with 512512). I'm not sure whether the resolution or batch size have a decisive impact on training performance. Could you please provide more details about training.

Input: majic_girl

Results generated by my model:

https://github.com/open-mmlab/PIA/assets/45676975/0f561d5f-8563-4685-ad25-cd31c28f6211

https://github.com/open-mmlab/PIA/assets/45676975/2ad3e1d2-8f57-434a-983e-7275148cc9c5

https://github.com/open-mmlab/PIA/assets/45676975/8389b56d-3f90-494a-993a-87de2c4c4d5c

https://github.com/open-mmlab/PIA/assets/45676975/b7d0bd66-7d05-43bb-a46c-871725ce0ad6

ymzhang0319 commented 6 months ago

Hi @ryancll, thx for your interest in our work!

The batch size we used for training is 1024, and in our experiments, we found that the batch size does indeed have an impact on the results.

tgxs002 commented 6 months ago

Hi @ymzhang0319 , may I know the training resolution? The batch size of 1024 seems very demanding for resolution higher than 256, or you achieved the batch size via gradient accumulation?

ymzhang0319 commented 6 months ago

Hey @tgxs002, the training resolution is 256x256 in our experiment. If train with limited resources, you can try gradient accumulation.

Tianhao-Qi commented 6 months ago

Hi @ryancll, thx for your interest in our work!

The batch size we used for training is 1024, and in our experiments, we found that the batch size does indeed have an impact on the results.

Do you mean that you train with a batch size of 4 videos with 16 frames per gpu and 16 gpus?

ymzhang0319 commented 6 months ago

Hi @Tianhao-Qi, if train with a batch size of 4 on 16 gpus, you need to set the gradient accumulation to 16 to reach 1024 in total.