Closed BestJuly closed 3 years ago
Same doubt.
Sorry for the late reply. Regarding the epoch number, we follow the work "Self-Supervised Learning by Cross-Modal Audio-Video Clustering". In practice, we find it works pretty well so we keep this practice.
But I didn't check the other option as you suggest: " using 180 epochs to train and conduct temporal jittering during each sampling procedure". Probably these two practicecs can achieve comparable results.
Hi, thank you for your work. I have a question about the epoch number in your paper.
I found in [1], there are some descriptions which you might refer to:
It is strange that using
self.rgb_lines = list(lines) * 10
inucf101.py
and mention the total epoch number is 18. And if video clips are sampled randomly in temporal axis, using 180 epochs will have the same effect.Therefore, my question is why not just using 180 epochs to train and conduct temporal jittering during each sampling procedure? Then just use
self.rgb_lines = list(lines)
and set the epoch number to 180 would be more clear for code. There might be some reasons or tricks that I have not noticed. Thank you in advance.