wilson1yan / VideoGPT

MIT License
962 stars 115 forks source link

FVD score not conform to what's reported in the paper #15

Open Gabriel-Huang opened 3 years ago

Gabriel-Huang commented 3 years ago

Hi, I tested the Bair pre-trained VideoGPT model, but your evaluation script reported FVD to be 1000+, however FVD* was around 100, probably there's a mistake with your evaluation script?

wilson1yan commented 3 years ago

That's odd, it runs fine on my end (maybe around ~104 FVD). Are you running the command below?

python scripts/compute_fvd.py --ckpt bair_gpt

Gabriel-Huang commented 3 years ago

That's odd, it runs fine on my end (maybe around ~104 FVD). Are you running the command below?

python scripts/compute_fvd.py --ckpt bair_gpt

Yes, but I modified the script so that it works with Bair in MP4 format, I guess this wouldn't make much difference?

PS: I changed args.data_path to the folder that contains all the mp4 clips, and your code should automatically switch from HDF5Dataset to VideoDataset. The code runs without any bug but FVD is bumped to 1000+

wilson1yan commented 3 years ago

It might make FVD a little worse, since FVD is usually sensitive to noise, and saving as .mp4 files may result in small compression artifacts.

But it shouldn't be that much worse. 1000+ FVD seems to indicate to me that either the samples or the real test examples are incorrect. Does sampling produce good quality samples? Similarly, it might help to visualize some instances of the test set to double check that data is correct.

Gabriel-Huang commented 3 years ago

I also tried sampling mp4s with the pre-trained model first, then calculate FVD from the original implementation, I got FVD around 500.

I visualized the generated clips, their first frames (the conditioned frame) are the same as their corresponding ground truth clips', but the motion of the robot arm do not conform to the ground truth, this is normal right? since the model is only conditioned on the first frame but not the motion.

btw, thanks for the quick reply :)

wilson1yan commented 3 years ago

Yes, the motion most likely would not match ground truth. Not sure sure what the cause of this issue is. Could you share what some samples look like?