showlab / Tune-A-Video

[ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
https://tuneavideo.github.io
Apache License 2.0
4.15k stars 376 forks source link

Improved Consistency using DDIM inversion (?) #26

Closed HyeonHo99 closed 1 year ago

HyeonHo99 commented 1 year ago

Hi, thank you so much for impressive works!

I noticed there was an updated 'News' of 'Improved Consistency through DDIM inversion'. Can you explain a bit more about this update? So what I understood is "before : DDPM inversion (DDPM forward and reverse)" then "after: DDIM inversion (DDIM forward and reverse). Am I right? Also, then is DDIM sampler used in both fine-tuning an inference?

Thank you again for nice works.

HyeonHo99 commented 1 year ago

Plus) In your train_tuneavideo.py, it still uses DDPMScheduler ex>>noise_scheduler = DDPMScheduler.from_pretrained(pretrained_model_path, subfolder="scheduler") Is this python file just not updated yet?

Thank you

Randle-Github commented 1 year ago

I've checked that it is the problem of attention_mask, which is not actually used in training process. Just delete it and it could be working. However, there're other problems about the version, maybe it is better to alter to the right verision.

zhangjiewu commented 1 year ago

the DDIM inversion is only used at inference

before: DDIM sampling from random noise after: DDIM sampling from inverted noise of source video