m-bain / frozen-in-time

Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]
https://arxiv.org/abs/2104.00650
MIT License
342 stars 43 forks source link

Fine tune on a custom dataset #31

Closed avinashsai closed 2 years ago

avinashsai commented 2 years ago

Hi,

Congrats on the amazing work!! I want to fine-tune this model on a custom video dataset. It has a video and text as the inputs but no image is provided in the input. How can I fine-tune without image in the input?

Thank you.

m-bain commented 2 years ago

Hi, We do all finetuning on video only (without images). You can look at configs/msrvtt_4f_i21k.json as an example. You just need to write your own TextVideoDataset class, following the same structure and it will be work fine :).