Closed Rurouni-z closed 8 months ago
i get it: softmax(x)_i = exp(x_i) / sum(exp(x)). thats why cant see sum()
I still don't know why you don't softmax dim but key_points_nums
here is from gpt:
When you perform a softmax on the last dimension of a three-dimensional array (the key_points_nums dimension), you are performing a softmax operation on each vector in each [dim, query_point_nums] slice.
Since we'd like to softly select the channel of each point, we perform softmax regarding to key_points_num.
where i can find below in your code.
and why your softmax attn the last dim, then it is channel attention. i am a newbie, hope you can help me, thank you very much! :)