pals-ttic / adapting-CLIP

MIT License
64 stars 10 forks source link

Something wrong in Eq. (7) in the manuscript #3

Closed rshaojimmy closed 2 years ago

rshaojimmy commented 2 years ago

Hi,

Thanks for your great work!

May I ask if there is something wrong in the first term of Eq. (7) in the manuscript as there are duplicated WV without keys (K).

Thanks.

raymondyeh07 commented 2 years ago

Hi @rshaojimmy,

Thanks for the catch. There is a typo in Eq. (7). The first WV should be WK, as an attention takes in a Query and a Key.

We will update the arxiv version.

Best, Raymond

rshaojimmy commented 2 years ago

Got it. Thanks!

rshaojimmy commented 2 years ago

May I further ask what is the second term in Eq. (7) for? The first term is the self-attention conduced within the region r. Why should add one more second term compared to normal self-attention?

Thanks.

raymondyeh07 commented 2 years ago

Recall, we defined \mathcal{R} to be a set of patch indices, i.e., it does not contain the region token r(l). In a normal self-attention, each token also computes an attention with itself. Hence, we needed the second term.

Side Note: We could have defined the set \mathcal{R} to also include the region token then we will only have the first term. However, this requires a single notation for both patch (f) and region token (r), which we thought might confuse the reader.

rshaojimmy commented 2 years ago

Thanks! But it seems that this paper did not explicitly mention that \mathcal{R} does not contain the region token r(l) in the manuscript.

raymondyeh07 commented 2 years ago

In the paper, "\mathcal{R} denotes a set of patch indices covered by the region". As a region token does not correspond to a patch, it is not included in \mathcal{R}. We can make this more explicit. Thanks for pointing this out.

rshaojimmy commented 2 years ago

I see. Thanks so much.