universome / stylegan-v

[CVPR 2022] StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2
https://universome.github.io/stylegan-v
333 stars 36 forks source link

FVD calculation is not deterministic #23

Open waitingcheung opened 2 years ago

waitingcheung commented 2 years ago

I was looking for the implementation of FVD for PyTorch and I came across your work. I really appreciate your effort in standardizing FVD calculation.

I tried to run the code calc_metrics_for_dataset.py on my own datasets and I realized the results are not deterministic, i.e. the FVD scores are slightly different for different runs. It is probably due to frame sampling.

What is the recommended way to run your script? Shall I fix the random seed and run only once or run it multiple times and take the average?

universome commented 2 years ago

Hi, thank you! It's true that FVD is not deterministic, but it is also true for FID, because one uses random samples to compute the fake statistics from — though FID is much more stable. In the ideal case, I think one would want to compute the metric several times and report mean/stds (the metric computation script gives an option to do this). But to be honest, in our case it was unnecessary since: 1) variations were small enough (compared to FVD magnitudes) and 2) the performance gaps between the methods were large enough to make the variations negligible. So I would say that it makes sense to report FVD in terms of mean±std only if the variations are large enough or the performance gaps between the baselines are too small. To be honest, I do not remember people reporting FVD in terms of mean±std (note that previous FVD implementations have similar variations).