Closed Xingrun-Xing closed 4 years ago
Hi, patchwise attention means that the attention weights are learned based on a patch representation (e.g., features inside a local footprint, Eq. 5). It is different from pairwise attention where attention weights are learned based on paired features (e.g., F_i and F_j).
Hi, @hszhao, I tried to understand what is the specific criteria to choose the paired features. In your paper, you said it is based on a set of indices that specifies which feature vectors are aggregated to construct the new feature. But how to generate this set of indices, the footprint? I didn't find any illustration in the paper. My first guess is it is based on pixel location, a simple version is 8-neighbors. Are there any differences during training and test?
I think patch-attention is have a window and doing self attention within it. Please correct me if I am wrong. Maybe the author can elaborate more.