voletiv / mcvd-pytorch

Official implementation of MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation (https://arxiv.org/abs/2205.09853)
MIT License
331 stars 26 forks source link

Why set KTH/Cityscapes test dataset length to 256? #20

Closed JunyaoHu closed 1 year ago

JunyaoHu commented 1 year ago
  1. I see you set test dataset length to 256. It is for calculating FVD easier? Or other reason? Do you follow other researcher's setting? This setting is different from other model like SimVP (For SMMNIST train dataset length is 10000 and test datatset length is 10000, and for KTH train dataset and test datatset are also different).

https://github.com/voletiv/mcvd-pytorch/blob/451da2eb635bad50da6a7c03b443a34c6eb08b3a/datasets/__init__.py#L172-L173

https://github.com/voletiv/mcvd-pytorch/blob/451da2eb635bad50da6a7c03b443a34c6eb08b3a/datasets/__init__.py#L193-L194

https://github.com/voletiv/mcvd-pytorch/blob/451da2eb635bad50da6a7c03b443a34c6eb08b3a/datasets/__init__.py#L205-L206

image

  1. In SimVP, for KTH dataset, their method is to clip the video into small clips offline. And I notice your code will get a clip randomly when user gets an item by an index online. And your test dataset length is actually smaller than 256 (5 people 4 scene 6 action = 120 video), and you use a mod method to get more data (like index 200 => 200*(119/255) = 93.3 = 93 => the random clipped video which index is 93 in 120 origin test dataset.) Has this method been used in other models?
video_index = round(index / (self.__len__() - 1) * (self.max_index() - 1)) = round(index*(119/255))
# a jroject from [0,255] to [0, 119]
shard_idx, idx_in_shard = self.videos_ds.get_indices(video_index) 
# get video from 120 length dataset

https://github.com/voletiv/mcvd-pytorch/blob/451da2eb635bad50da6a7c03b443a34c6eb08b3a/datasets/bair.py#L48-L75

JPerAsperaadAstra commented 8 months ago

Same confusion, do you understand now?

JunyaoHu commented 8 months ago

finally, we follow MCVD config @JPerAsperaadAstra

video num may have a big influence on fvd, more videos usually get less fvd, we should select a suitable number (256) for evaluation.