luo3300612 / image-captioning-DLCT

Official pytorch implementation of paper "Dual-Level Collaborative Transformer for Image Captioning" (AAAI 2021).
BSD 3-Clause "New" or "Revised" License
194 stars 31 forks source link

Concept Question about MHLCCA #8

Closed TCBpenta8 closed 2 years ago

TCBpenta8 commented 3 years ago

Hi,

great to see a new image captioning model!

I got a question about MHLCCA. In your paper, it suggests setting both the key and value of MHLCCA as MHCRA(H_region, H_region, H_region ... ). However, in your code https://github.com/luo3300612/image-captioning-DLCT/blob/main/models/DLCT/encoders.py line 155, the key and value are formed by torch.cat([out_region, out_grid], dim=1). Seems it is a bit different from the paper. Would you slightly explain the idea of this, or am I miss understanding the concept?

Thank you!

tuyunbin commented 3 years ago

I notice this difference. I think the concatenation operation can make each node has a self-connected edge, as shown in Fig. 3. But it is different from Eq. (21) and (22) which did not compute the self-connection for each node. Besides, I think in Eq. (20), CRA should be LCCA.