mlvlab / SPoTr

Official pytorch implementation of "Self-positioning Point-based Transformer for Point Cloud Understanding" (CVPR 2023).
95 stars 5 forks source link

Question about equ(9) #22

Closed Rurouni-z closed 8 months ago

Rurouni-z commented 10 months ago
\mathbb{A}_{q, k, c}=\frac{\exp \left(\mathcal{M}^{\prime}\left(\left[\mathcal{R}^{\prime}\left(\mathbf{f}_{q}, \mathbf{f}_{k}\right) ; \phi_{q k}\right] / \tau\right)_{c}\right)}{\sum_{k^{\prime} \in \Omega_{k e y}} \exp \left(\mathcal{M}^{\prime}\left(\left[\mathcal{R}^{\prime}\left(\mathbf{f}_{q}, \mathbf{f}_{k^{\prime}}\right) ; \phi_{q k^{\prime}}\right] / \tau\right)_{c}\right)}

where i can find below in your code.

\sum_{k^{\prime} \in \Omega_{k e y}} \exp \left(\mathcal{M}^{\prime}\left(\left[\mathcal{R}^{\prime}\left(\mathbf{f}_{q}, \mathbf{f}_{k^{\prime}}\right) ; \phi_{q k^{\prime}}\right] / \tau\right)_{c}\right)

and why your softmax attn the last dim, then it is channel attention. i am a newbie, hope you can help me, thank you very much! :)

Rurouni-z commented 10 months ago

i get it: softmax(x)_i = exp(x_i) / sum(exp(x)). thats why cant see sum()

I still don't know why you don't softmax dim but key_points_nums

here is from gpt:

When you perform a softmax on the last dimension of a three-dimensional array (the key_points_nums dimension), you are performing a softmax operation on each vector in each [dim, query_point_nums] slice.

PJin0 commented 10 months ago

Since we'd like to softly select the channel of each point, we perform softmax regarding to key_points_num.