Closed ruiyan1995 closed 1 year ago
Thanks for your reply. Yes, but some of the details are different. For example, each video in MSRVTT-retrieval has more than one captions. Do you use the fisrt one or choose one at random in this work?
Thanks for pointing this out!
Yes. MSRVTT-Retrieval has 10 captions for each video, and we treat them as independent items. Therefore, the JSON file should be like:
{
"train": [
{
"video": "vid1",
"caption": "cap1"
},
{
"video": "vid1",
"caption": "cap2"
},
{
"video": "vid1",
"caption": "cap3"
},
{
"video": "vid2",
"caption": "cap1"
},
{
"video": "vid2",
"caption": "cap2"
},
{
"video": "vid2",
"caption": "cap3"
}
],
"val": [],
"test": []
}
For example, there are two captions for video5029 in txt_msrvtt-retrieval.json.
Thanks. But I cannot reproduce the results of zero-shot retrieval on MSRVTT. I use the following settings: 1, JSfusion (9000 for train and 1000 for test) 2, For test, we choose the first caption as same as "Frozen in Time" details at here 3, load the best ckpt provided by you
I can only get test {'r@1': 0.22200000000000017, 'r@5': 0.5130000000000003, 'r@10': 0.6680000000000005, 'median': 5}
of zero-shot retrieval on MSRVTT.
Let me check it. The result is interesting where R@5 and R@10 are even higher than proposed 😂
Please first make sure that we use 5-sampled video frames and frame size 224 for downstream tasks.
Yes, I confirm.
@tsujuifu Hi, I have checked again. I used specific caption idx's in jsfusion provided by "jsfusion_val_caption_idx.pkl". So I want to know how do you get caption during testing on MSRVTT?
Thanks for your kindly helps. I have repoduced the results of MSRVTT-retrieval (R@1: 33.7) with finetuned ckpt (provided by you), but still cannot get promising results on the zero-shot setting.
@tsujuifu can you provide the whole txt_msrvtt-retrieval.json you use? I can't repoduce the result of downstream task of MSRVTT-retrieval use your best pretrained ckpt. I want to make sure I use the right train&test data
Since each dataset has its original JSON file, the scripts should be different (but not too difficult, I believe). The _txtxxx.json is provided as the example format to follow.