understanding of distance of feature map

i am confused about the following words （ last paragraph of 3.2 in paper ）

But the values of cosine distance in ‘output’ are quite small, indicating that global context features modeled by the non-local block are almost the same for different query positions.

in my opinion，smaller distance can only reflect smaller distance between feature vectors of arbitrary position than input feature vector。 Why can it reflect that global context features are the same for different locations?

xvjiarui / GCNet

understanding of distance of feature map #14