Hi,
I want to know if we could evaluate this model on our custom dataset without fintuning? Or could you show me how to do the inference based on pre-trained ckpt? My task is to do VideoQA and Video-text Retrieval and the format of dataset is quiet similar to MSRVTT. Thanks a lot !
Hi, I want to know if we could evaluate this model on our custom dataset without fintuning? Or could you show me how to do the inference based on pre-trained ckpt? My task is to do VideoQA and Video-text Retrieval and the format of dataset is quiet similar to MSRVTT. Thanks a lot !