Projection of real video with multiple frames

Hello,

I have some questions about the inversion of StyleGAN-V generator in FaceForensics dataset.

In case of image (a single frame), projection works well. However, when I tried to project the video (multiple frames at once), I found that the projected video contains almost identical frames in the entire time step.

Is this a normal phenomenon?

For projecting the video (16 frames in my case), I change some codes in "src/scripts/project.py" as below:

adjust times step (0 to 16 frames) In line 59, from ts = torch.zeros(num_videos, 1, device=device) to ts = torch.arange(num_videos, 16, device=device)
make motion code trainable (comment out the line 110 and uncomment line 109)
extract target_features of real videos per frame, and change the distance as being measured between videos, not frames. For example, in line 140, dist = (target_features - synth_features).square().sum() In batch dimension of target_features and synth_features, they have 16 frames of a single video, not different images as original code does.

Thanks,

universome / stylegan-v

Projection of real video with multiple frames #35