Closed avinashsai closed 2 years ago
Hi,
We do all finetuning on video only (without images). You can look at configs/msrvtt_4f_i21k.json
as an example. You just need to write your own TextVideoDataset class, following the same structure and it will be work fine :).
Hi,
Congrats on the amazing work!! I want to fine-tune this model on a custom video dataset. It has a video and text as the inputs but no image is provided in the input. How can I fine-tune without image in the input?
Thank you.