Concept Question about MHLCCA

luo3300612 / image-captioning-DLCT

Official pytorch implementation of paper "Dual-Level Collaborative Transformer for Image Captioning" (AAAI 2021).

BSD 3-Clause "New" or "Revised" License

194 stars 31 forks source link

Hi,

great to see a new image captioning model!

I got a question about MHLCCA. In your paper, it suggests setting both the key and value of MHLCCA as MHCRA(H_region, H_region, H_region ... ). However, in your code https://github.com/luo3300612/image-captioning-DLCT/blob/main/models/DLCT/encoders.py line 155, the key and value are formed by torch.cat([out_region, out_grid], dim=1). Seems it is a bit different from the paper. Would you slightly explain the idea of this, or am I miss understanding the concept?

Thank you!

luo3300612 / image-captioning-DLCT

Concept Question about MHLCCA #8