showlab / Tune-A-Video

[ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
https://tuneavideo.github.io
Apache License 2.0
4.25k stars 385 forks source link

DDIM inversion or ramdon noise? #73

Open tszFung-gz opened 1 year ago

tszFung-gz commented 1 year ago

Excellent work! I have a question to ask, is this DDIM inversion necessary in the process of diffusion of video data? I tried to use DPM inversion and got seriously wrong results. I tried to use DDIM inversion and got seriously wrong style, though the better temporal consistency. What other samplers are available?

zhangjiewu commented 1 year ago

Yes, DDIM inversion is necessary to preserve the motion in source video. DPM would fail as it is not deterministic. For style change, reducing finetuning steps may help.

tszFung-gz commented 1 year ago

Thanks for the answer~