universome / stylegan-v

[CVPR 2022] StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2
https://universome.github.io/stylegan-v
333 stars 36 forks source link

Projection of real video with multiple frames #35

Open hse1032 opened 1 year ago

hse1032 commented 1 year ago

Hello,

I have some questions about the inversion of StyleGAN-V generator in FaceForensics dataset.

In case of image (a single frame), projection works well. However, when I tried to project the video (multiple frames at once), I found that the projected video contains almost identical frames in the entire time step.

Is this a normal phenomenon?

For projecting the video (16 frames in my case), I change some codes in "src/scripts/project.py" as below:

  1. adjust times step (0 to 16 frames) In line 59, from ts = torch.zeros(num_videos, 1, device=device) to ts = torch.arange(num_videos, 16, device=device)

  2. make motion code trainable (comment out the line 110 and uncomment line 109)

  3. extract target_features of real videos per frame, and change the distance as being measured between videos, not frames. For example, in line 140, dist = (target_features - synth_features).square().sum() In batch dimension of target_features and synth_features, they have 16 frames of a single video, not different images as original code does.

Thanks,