Closed happy-hsy closed 2 years ago
@happy-lifi , here we use the pretrained model from image supervised pretraining, image contrastive learning, BEVT-image only pretraining and BEVT pretraining as the initialization and report the finetuning results on corresponding video datasets.
Hi. Thanks for this nice work. But I have some confusion in Image Sup, Image CL, BEVT-I and BEVT experiments. In these experiments, are you using the pretrain checkpoint or the finetune(after self-supervised learning pretrain) checkpoint?