nihaomiao / CVPR23_LFDM

The pytorch implementation of our CVPR 2023 paper "Conditional Image-to-Video Generation with Latent Flow Diffusion Models"
BSD 2-Clause "Simplified" License
443 stars 42 forks source link

About demo scenario #11

Closed emjay73 closed 1 year ago

emjay73 commented 1 year ago

Hi, Thank you for sharing the demo scenarios.

What if I want to apply this LFDM demo code to the custom image,

what are the things that I have to be aware of?

For instance, do I have to align the human pose or facial feature position in advance?

Any other tips would be welcome.

And is there a code for fine-tuning the decoder using the custom image?

Also, what is the difference between the 2nd image and the 3rd image in the output gif? (Occlusion aware and Occlusion agnostic?)

Thank you.

nihaomiao commented 1 year ago

Hi, @emjay73, thanks a lot for your interest in our work! For the case of applying LFDM to new-domain images, I only tried the FaceForeniscs dataset (a face video dataset, Section 4.3, page 8 in our paper). You can also refer to the final example video on the Readme. I found that the original LFDM can still generate reasonable results for unseen faces under different poses, and if you fine-tune the decoder with unlabeled facial videos, it can generate better videos for unseen faces from the same domain. I don't particularly manage the code of fine-tuning decoder. You can simply freeze the encoder and flow predictor part and train the decoder with a small learning rate using the released code of stage-one LFAE.