Some little errors in preparation of conceptual-captions dataset

weiyx16 commented 4 years ago

When I wanted to extract img feature from conceptual-captions following the instructions, I found some errors during inference with this file and it took me sometime to debug and I'd like to share with you:

Firstly, the calling script should be:

python ./tools/generate_tsv_v2.py --gpu 0,1,2,3,4,5,6,7 --cfg experiments/cfgs/faster_rcnn_end2end_resnet.yml --def models/vg/ResNet-101/faster_rcnn_end2end_final/test.prototxt --net data/faster_rcnn_models/resnet101_faster_rcnn_final.caffemodel --split conceptual_captions_train --data_root {Conceptual_Captions_Root} --out {Conceptual_Captions_Root}/train_frcnn/

A little mistake in --def params

Secondly, in Line46 and Line54, should be:

with open(os.path.join(data_root, 'utils/train.json')) as f:  #Line46
with open(os.path.join(data_root, 'utils/val.json')) as f:  #Line54

Thirdly, in Line67, should be:

zip_image = ziphelper.imread(str("/".join(data_root, im_file)))

Fourthly, in Line142, we need another params:

def generate_tsv(gpu_id, prototxt, weights, image_ids, data_root, outfolder):

And correspondly, in Line170, the same problem:

json.dump(get_detections_from_im(net, im_file, image_id, ziphelper, data_root), f)

Really thank you for your sharing of the code!

jackroos commented 4 years ago

Thanks for your feedback! Really sorry that I didn't carefully check this part. I would make an update or would you like to create a PR? Thanks again for your great work!

weiyx16 commented 4 years ago

You are welcome! Sure. But to be honest, most of the mistakes occur in another repo, I will just create a PR in that repo. Is that ok? Thank you for your reply!

jackroos commented 4 years ago

Sure. Thanks!

weiyx16 commented 4 years ago

I have already created a pr in that repo and you can merge it for better usage.

jackroos commented 4 years ago

@weiyx16 I just found another mistake about max boxes number in the repo. You can refer to this issue for details.

weiyx16 commented 4 years ago

@weiyx16 I just found another mistake about max boxes number in the repo. You can refer to this issue for details.

Since for the first time to do the reproduce, I used 36 bbox, so I can report another interesting ablation. For RefCOCO+ Detected Regions val, in src paper:

precomputed 100 box and half ep: 70.7;
finetune 100 box and half ep: 71.1;
finetune 100 box and full ep 71.6;

in my setting:

precomputed 36 box and full ep: 71.56.

It seems that in this task, if you add ep to src setting, the gap between different bbox or precomputed or not is really small.

jackroos / VL-BERT

Some little errors in preparation of conceptual-captions dataset #2