my test output（Reconstructed video lacks consistency）？

omerbt / TokenFlow

Official Pytorch Implementation for "TokenFlow: Consistent Diffusion Features for Consistent Video Editing" presenting "TokenFlow" (ICLR 2024)

https://diffusion-tokenflow.github.io

MIT License

1.56k stars 135 forks source link

my test output（Reconstructed video lacks consistency）？ #12

Open zhanghongyong123456 opened 1 year ago

zhanghongyong123456 commented 1 year ago

test cmd: python preprocess.py

https://github.com/omerbt/TokenFlow/assets/48466610/3fee547d-f65c-4af0-bee7-5712229c582d

puppynull commented 1 year ago

same result with yours

omerbt commented 1 year ago

Inaccurate reconstruction is due to: (i) inaccurate DDIM inversion, (ii) imperfect VAE latent space autoencoder.

Interestingly, our method may still overcome issues with the DDIM inversion thanks to our TokenFlow injection. For example, the editing result for this video does not exhibit these artifcats that occur in the DDIM inversion process.

anime26398 commented 1 year ago

Yes, I also experienced this issue. In my experience, it happens because each frame is inverted independently (and becomes severe when fewer DDIM steps are used). However, if you use Cross-Frame attention and Tokenflow propagation during DDIM inversion and reconstruction, this issue gets resolved even for reconstructed video

G-U-N commented 1 year ago

Yes, I also experienced this issue. In my experience, it happens because each frame is inverted independently (and becomes severe when fewer DDIM steps are used). However, if you use Cross-Frame attention and Tokenflow propagation during DDIM inversion and reconstruction, this issue gets resolved even for reconstructed video

I am in total agreement.

hyoseok1223 commented 9 months ago

@anime26398 Then Tokenflow propagation is implemented in this repo? I can't find 'compute nn fields' and 'tokenflow propagation' it just looks using PnP instead.

omerbt / TokenFlow

my test output（Reconstructed video lacks consistency） ？ #12

my test output（Reconstructed video lacks consistency）？ #12