Closed awkrail closed 2 years ago
Hi @misogil0116, Thanks for your interest in our work! You can start with one of our retrieval tasks such as msrvtt retrieval and then adapt it to other datasets of interest.
@jayleicn Hi, thank you for your quick response! I have other question about the train.jsonl. When loading this file, each element in the array has three keys: caption, clip_name, and sen_id. I guess that the clip_name should be same to the key in lmdb file, which stores video binary files. Is this correct?
In [3]: with open("/mnt/LSTA6/data/nishimura/misc/clipbert/txt_db/msrvtt_retrieval/train.jsonl") as f:
...: train_data = [json.loads(l.strip("\n")) for l in f.readlines()]
...:
In [4]: train_data[0]
Out[4]:
{'caption': 'a cartoon animals runs through an ice cave in a video game',
'clip_name': 'video2960',
'sen_id': 0}
Yes, you are correct!
Thank you!
Hi, thank you for sharing this interesting work!
I would like to try fine-tuining ClipBERT on other video-and-language dataset, such as YouCook2. My target downstream task is cross-modal retrieval in sentence-level, rather than paragraph-level.
Do you have any recommendations to train ClipBERT on custom datasets? In particular, I am curious about how to decide hyper-parameters described in config files for other datasets. Thank you.