[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.
In Section 4.2 Analysis of Sparse Sampling, it is read If not otherwise stated, we randomly sample a single frame (Ntrain=1 and T=1) from full-length videos for training, and use the middle frame (Ntest=1) for inference, with input image size L=448.
I am confused, if not otherwise stated in the following analysis, is T of training equals to T of test ? or T of test always equals to 1? Since i have noticed that there is no T_train or T_test.
In Section
4.2 Analysis of Sparse Sampling
, it is readIf not otherwise stated, we randomly sample a single frame (Ntrain=1 and T=1) from full-length videos for training, and use the middle frame (Ntest=1) for inference, with input image size L=448.
I am confused, if not otherwise stated in the following analysis, is T of training equals to T of test ? or T of test always equals to 1? Since i have noticed that there is no T_train or T_test.