luo3300612 / image-captioning-DLCT

Official pytorch implementation of paper "Dual-Level Collaborative Transformer for Image Captioning" (AAAI 2021).
BSD 3-Clause "New" or "Revised" License
193 stars 31 forks source link

What is the relation between X-101 and X-152 extractors in the code? #9

Closed hcl14 closed 2 years ago

hcl14 commented 3 years ago

Your script extract_region_feature.py has weights for X-101 hard-coded. But the file with features has name "region_before_X152.hdf5". Also, there is no information from you which checkpoint to use for extracting grid features. In the paper you mention both X-101 and X-152 as extractors.

Which extractor checkpoint should I use for grid-feats-vqa: X-101 or X-152? Can I use X-152 for region features as well?

luo3300612 commented 3 years ago

Both X-101 and X-152 can be used to extract region/grid features. Sorry for the confusing HDF5 filename. It should be "region_before_X101.hdf5"