data_root: ${oc.env:SL_DATA_DIR}/videos_images
anno_root_downstream: ${oc.env:SL_DATA_DIR}/anno_downstream
train_file: ['${anno_root_downstream}/msrvtt_ret_train7k.json', '${data_root}/msrvtt_2fps_224', vidtest_types: [mc_test, ]
test_file:
mc_test: ['${anno_root_downstream}/msrvtt_mc_test.json', '${data_root}/msrvtt_2fps_224', video]
stop_key: None # used to choose the best ckpt. If None, save the last.
The format of 'msrvtt_ret_train7k.json' and 'msrvtt_mc_test.json' is totally different.
'msrvtt_ret_train7k.json'
{"video": "video2960.mp4", "caption": "a cartoon animals runs through an ice cave in a video game", "duration": 12.32},
In contrast, 'msrvtt_mc_test.json' contains 5 captions and 1 answer.
{"video": "video9770.mp4", "caption": ["the boy is trying to fix the problem", "a movie trailer shows various scenes from a movie", "asian man discusses technology in the younger generations", "two men on wave runner in ocean rescuing a surfer", "a group is dancing"], "answer": 0},
Why do you train "msrvtt_mc" task with "msrvtt_ret_train7k.json"?
The msrvtt_mc task is essentially a simplified version of the retrieval task msrvtt_ret with a much smaller retrieval pool. They can use the same training checkpoint for inference.
Hi.
configs/ret_msrvtt_mc.yaml
data_root: ${oc.env:SL_DATA_DIR}/videos_images anno_root_downstream: ${oc.env:SL_DATA_DIR}/anno_downstream train_file: ['${anno_root_downstream}/msrvtt_ret_train7k.json', '${data_root}/msrvtt_2fps_224', vidtest_types: [mc_test, ] test_file: mc_test: ['${anno_root_downstream}/msrvtt_mc_test.json', '${data_root}/msrvtt_2fps_224', video] stop_key: None # used to choose the best ckpt. If None, save the last.
The format of 'msrvtt_ret_train7k.json' and 'msrvtt_mc_test.json' is totally different.
'msrvtt_ret_train7k.json' {"video": "video2960.mp4", "caption": "a cartoon animals runs through an ice cave in a video game", "duration": 12.32},
In contrast, 'msrvtt_mc_test.json' contains 5 captions and 1 answer.
{"video": "video9770.mp4", "caption": ["the boy is trying to fix the problem", "a movie trailer shows various scenes from a movie", "asian man discusses technology in the younger generations", "two men on wave runner in ocean rescuing a surfer", "a group is dancing"], "answer": 0},
Why do you train "msrvtt_mc" task with "msrvtt_ret_train7k.json"?
Isn't there a train file for "msrvtt_mc"?