nihaomiao / CVPR23_LFDM

The pytorch implementation of our CVPR 2023 paper "Conditional Image-to-Video Generation with Latent Flow Diffusion Models"
BSD 2-Clause "Simplified" License
432 stars 43 forks source link

Some doubts about the paper and code #25

Closed XiaoHaoPan closed 10 months ago

XiaoHaoPan commented 11 months ago

1.What are the three losses in the first stage?

image

2.The generator model in this parameter seems to contain three trainable networks in one stage, is it correct?

image

3.What is the purpose of adding x0 in the second stage, I did not find it in the paper

image

4.I am not very clear about the division of test sets and datasets

nihaomiao commented 11 months ago

Hi, @XiaoHaoPan,

  1. they are perceptual loss and equivariance loss. You may refer to Section 4.2 Model Implementation in our paper.
  2. Correct.
  3. We use $x_0$ to provide the shape information for the flow and occlusion map.
  4. The split of the train/test set can be found in our preprocessing codes.
XiaoHaoPan commented 11 months ago

Thank you for answering, I have another question, please ask the video generated in the first phase of the test, where the second frame generated the frame, the reference frame used is the first real driver frame, or the model generated fake driver frame?

XiaoHaoPan commented 11 months ago

Please ask how the settings in your paper are different from those in the code. Do I need to change the epoch settings to achieve the pre-training effect you provide?

image image

nihaomiao commented 11 months ago

Hi, @XiaoHaoPan, during testing, we use DM to generate a sequence of flow maps and apply flow maps to the given real reference image to generate a sequence of generated frames. We always keep warping the real reference frame to maintain good quality.

XiaoHaoPan commented 11 months ago

Thanks!I looked at your training DM phase code and I found that these two settings are different for different datasets. Is it because of the differences in different datasets?

MUG: image

MHAD: image

NATOPS: image

Did the pre-trained model you provided train with this setup?

nihaomiao commented 11 months ago

Hi, @XiaoHaoPan, Sorry for misunderstanding your second question! I have updated my previous comments. Yes, the size of NATOPS is much larger than the MUG and MHAD. So I set a smaller number of epochs. The pre-trained models I provided should be consistent with these settings if I didn't edit the code later. But it has been several months ago and I am not very sure about that.