Closed HJYao00 closed 6 days ago
Hi @HJYao00,
Thank you for your interest in our work. The video encoder we used (InternVideo2
) works for both images and video. And during pretraining stage, we use image data with video encoder to warm-up projector. The corresponding pretraining script is available at pretrain_projector_video_encoder.sh.
I hope it will help. Please let me know if you have any questions.
Thank you!
Thank you for sharing your work!
During the pre-training stage, the paper mentions that you used image data to train the video branch. How did you use image data to train the video part? Did you treat each image as a single frame of a video?