showlab / all-in-one

[CVPR2023] All in One: Exploring Unified Video-Language Pre-training
https://arxiv.org/abs/2203.07303
277 stars 16 forks source link

zero-shot on custom dataset #18

Open bibisbar opened 1 year ago

bibisbar commented 1 year ago

Hi, I want to know if we could evaluate this model on our custom dataset without fintuning? Or could you show me how to do the inference based on pre-trained ckpt? My task is to do VideoQA and Video-text Retrieval and the format of dataset is quiet similar to MSRVTT. Thanks a lot !