Closed TCBpenta8 closed 2 years ago
I notice this difference. I think the concatenation operation can make each node has a self-connected edge, as shown in Fig. 3. But it is different from Eq. (21) and (22) which did not compute the self-connection for each node. Besides, I think in Eq. (20), CRA should be LCCA.
Hi,
great to see a new image captioning model!
I got a question about MHLCCA. In your paper, it suggests setting both the key and value of MHLCCA as MHCRA(H_region, H_region, H_region ... ). However, in your code https://github.com/luo3300612/image-captioning-DLCT/blob/main/models/DLCT/encoders.py line 155, the key and value are formed by torch.cat([out_region, out_grid], dim=1). Seems it is a bit different from the paper. Would you slightly explain the idea of this, or am I miss understanding the concept?
Thank you!