Open LAB123-tech opened 8 months ago
Hi, it is a nice job about utilizing swin transformer to point cloud. However, I really don't understand the content of Memory-efficient self-attention.
$f{i,h}^{*}=\frac{\sum{j=1}^{N}(exp(e{ij,h})f{j}W{V,h})}{\sum{j=1}^{N}exp(e_{ij},h)}----(3)$
how can I understand the idea of allowing to postpone the SoftMax normalization and avoid constructing and storing ${αij,h}$ explicitly.
Calculating the denominator and numerator of Eq. (3) simultaneously is also a question that hard to fully understand.
Could you please give me some tips about how to grasp the idea of Memory-efficient self-attention.
Hi, it is a nice job about utilizing swin transformer to point cloud. However, I really don't understand the content of Memory-efficient self-attention.
$f{i,h}^{*}=\frac{\sum{j=1}^{N}(exp(e{ij,h})f{j}W{V,h})}{\sum{j=1}^{N}exp(e_{ij},h)}----(3)$
how can I understand the idea of allowing to postpone the SoftMax normalization and avoid constructing and storing ${αij,h}$ explicitly.
Calculating the denominator and numerator of Eq. (3) simultaneously is also a question that hard to fully understand.
Could you please give me some tips about how to grasp the idea of Memory-efficient self-attention.