youngfly11 / LCMCG-PyTorch

AAAI2020-The official implementation of "Learning Cross-modal Context Graph for Visual Grounding"
57 stars 12 forks source link

train/test/val.txt #11

Closed FYY799 closed 2 years ago

FYY799 commented 2 years ago

Hi, thank you for your great work, your code is very excellent! But may I ask you for some questions about the train.txt/test/val.txt?Can you provide the the files above?

youngfly11 commented 2 years ago

You can simply pair each image with their caption sequentially by yourself. like xxxx1.jpg, 0; xxxx1.jpg 2; xxxx1.jpg, 3; xxxx2.jpg, 0, xxxx2.jpg 1; xxxx2.jpg 2.

FYY799 commented 2 years ago

I'm sorry for bothering you again. 我下载得到了train/test/val.txt,但是其中只有img_id而没有sent_id,请问sent_id该怎么获取呢?

youngfly11 commented 2 years ago

the sent_id is just what we generate, which 0, 1,2,3,4 indeed; This id can index the caption in sent_anno.json

FYY799 commented 2 years ago

This is my train.txt: 5517849007 0 5221111799 3 3363750526 2 7014591581 1 2137836768 2
299265726 1 3323952123 0 4108989020 4956585720 4235671794 2158925258 4259870068 4510049538 2405325546 384465575 4791487303

请问每个img_id后的sent_id是随机的0,1,2,3,还是这个index是有要求的?

youngfly11 commented 2 years ago

like this: 5517849007 0 5517849007 1 5517849007 2 5517849007 3 5517849007 4 5221111799 0 5221111799 1 5221111799 2 5221111799 3 5221111799 4 .... We generate the train.txt like this because each image corresponds to 5 captions in Flickr30k

FYY799 commented 2 years ago

谢谢!

FYY799 commented 2 years ago

flickr_datasets/flickr30k_feat_nms/flickr30k_res50_nms1e3_feat_pascal/4398362068.pkl 请问这些pkl文件是如何获得的?