mbzuai-oryx / VideoGPT-plus

Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
Creative Commons Attribution 4.0 International
128 stars 6 forks source link

About pre-training stage. #10

Closed HJYao00 closed 6 days ago

HJYao00 commented 1 week ago

Thank you for sharing your work!

During the pre-training stage, the paper mentions that you used image data to train the video branch. How did you use image data to train the video part? Did you treat each image as a single frame of a video?

mmaaz60 commented 6 days ago

Hi @HJYao00,

Thank you for your interest in our work. The video encoder we used (InternVideo2) works for both images and video. And during pretraining stage, we use image data with video encoder to warm-up projector. The corresponding pretraining script is available at pretrain_projector_video_encoder.sh.

https://github.com/mbzuai-oryx/VideoGPT-plus/blob/4fb4457aee53745c851feff41bf4bdf7cbfb3098/videogpt_plus/model/internvideo/utils.py#L195

I hope it will help. Please let me know if you have any questions.

HJYao00 commented 6 days ago

Thank you!