wyhsirius / g3an-project

[CVPR 2020] G3AN: Disentangling Appearance and Motion for Video Generation
https://wyhsirius.github.io/G3AN/
MIT License
38 stars 8 forks source link

A problem of 'evalation/dataset.py' #5

Closed Zonsor closed 3 years ago

Zonsor commented 3 years ago

Thank you for keeping updating this source code. I just try the evaluation code and get FID around 83 as you mentioned (from your pre-trained model of UvA-NEMO). However, I think there is one bug in the 'evalation/dataset.py'. https://github.com/wyhsirius/g3an-project/blob/775f8204179f03fac9e5a45b98bb8cb342a04b34/evaluation/dataset.py#L29-L30 The videos less than 32 frames would be loaded incorrectly. For example, the videos we generate are 16-frame videos.

video = video[ -8 : 24 : 2]

The loaded videos would have only 4 frames. And the stat.npz is from 16-frame videos, which results in FID calculated between different-length videos.

wyhsirius commented 3 years ago

@Zonsor Hi, thanks for pointing out this bug. Yes, you are right. I have fixed it, now for both generation and uva.npz are 16-frames videos. I also noticed the input data normalization were not correct compared to the original implementation in evan and 3D-resnet, which I used to obtain numbers in the paper. I have also fixed this bug and update uva.npz. To run the code, please use the updated uva.npz. You can get FID around 86 now.

Zonsor commented 3 years ago

@wyhsirius Thanks for your prompt response. I tried the new source code and uva.npz to get FID around 86 now. However, I also compute uva.npz myself from the preprocessed version(this link) you provided. I only get FID around 132. Is your uva.npz computed by another version of UvA dataset?

wyhsirius commented 3 years ago

@Zonsor You are right, the version I used was cropped using slightly different bound boxes. I am sorry for this, I thought they were same. I recomputed the FID using preprocessed version, I got FID around 125 for videos 128x128 and FID around 58 for videos 64x64. I have already updated uva.npz. I think your number was obtained using resolution 128x128, right ?

Zonsor commented 3 years ago

@wyhsirius Yes, you are right. The preprocessed version you provide is 128x128. And the videos will be resized to 112x112 before forwarding the evaluation model. I also try the latest stats.npz, getting FID around 126 for videos 128x128 and FID around 58 for videos 64x64.