youngfly11 / LCMCG-PyTorch

AAAI2020-The official implementation of "Learning Cross-modal Context Graph for Visual Grounding"
57 stars 12 forks source link

some questions about features extraction #14

Open VV0808 opened 2 years ago

VV0808 commented 2 years ago

Thanks for the implementation of the paper.

I still have some questions about reproducing your result.

I use https://github.com/MILVLG/bottom-up-attention.pytorch to extract features, download r101'weight file(bua-caffe-frcn-r101_with_attributes.pth), and save the feature map from ’res4‘.(nms=0.3, score_thresh=0.1, min_bbox_num=10 and max_bbox_num=100)

All the feature maps are around 100G. However, you mentioned that your feature maps are around 300G in other issue.

I consider that i may make a mistake in features extraction so that i can't reproduce your result in the paper.

Can you check my method above?Thank you!