jackroos / VL-BERT

Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".
MIT License
738 stars 110 forks source link

Some little errors in preparation of conceptual-captions dataset #2

Closed weiyx16 closed 4 years ago

weiyx16 commented 4 years ago

When I wanted to extract img feature from conceptual-captions following the instructions, I found some errors during inference with this file and it took me sometime to debug and I'd like to share with you:

Really thank you for your sharing of the code!

jackroos commented 4 years ago

Thanks for your feedback! Really sorry that I didn't carefully check this part. I would make an update or would you like to create a PR? Thanks again for your great work!

weiyx16 commented 4 years ago

You are welcome! Sure. But to be honest, most of the mistakes occur in another repo, I will just create a PR in that repo. Is that ok? Thank you for your reply!

jackroos commented 4 years ago

Sure. Thanks!

weiyx16 commented 4 years ago

I have already created a pr in that repo and you can merge it for better usage.

jackroos commented 4 years ago

@weiyx16 I just found another mistake about max boxes number in the repo. You can refer to this issue for details.

weiyx16 commented 4 years ago

@weiyx16 I just found another mistake about max boxes number in the repo. You can refer to this issue for details.

Since for the first time to do the reproduce, I used 36 bbox, so I can report another interesting ablation. For RefCOCO+ Detected Regions val, in src paper:

in my setting:

It seems that in this task, if you add ep to src setting, the gap between different bbox or precomputed or not is really small.