laura-wang / video-pace

code for our ECCV-2020 paper: Self-supervised Video Representation Learning by Pace Prediction
98 stars 12 forks source link

About the epoch number #8

Closed BestJuly closed 3 years ago

BestJuly commented 3 years ago

Hi, thank you for your work. I have a question about the epoch number in your paper.

While when pretraining on UCF101 dataset, as it only contains around 9k videos in the training split, we set epoch size to be around 90k for temporal jittering following [1].

I found in [1], there are some descriptions which you might refer to:

For inference on the downstream tasks, we uniformly sample 10 clips per testing example and average their predictions to make a video-level prediction.

It is strange that using self.rgb_lines = list(lines) * 10 in ucf101.py and mention the total epoch number is 18. And if video clips are sampled randomly in temporal axis, using 180 epochs will have the same effect.

Therefore, my question is why not just using 180 epochs to train and conduct temporal jittering during each sampling procedure? Then just use self.rgb_lines = list(lines) and set the epoch number to 180 would be more clear for code. There might be some reasons or tricks that I have not noticed. Thank you in advance.

Tramac commented 3 years ago

Same doubt.

laura-wang commented 3 years ago

Sorry for the late reply. Regarding the epoch number, we follow the work "Self-Supervised Learning by Cross-Modal Audio-Video Clustering". In practice, we find it works pretty well so we keep this practice.
But I didn't check the other option as you suggest: " using 180 epochs to train and conduct temporal jittering during each sampling procedure". Probably these two practicecs can achieve comparable results.