Closed xyh001007 closed 1 year ago
We gather spatially scattered keys/values hence the computation can be efficiently done with dense matrix multiplication.
See the codes below
https://github.com/rayleizhu/BiFormer/blob/b0ccf7a65f02b406b776e8cf6b56501620349da2/ops/bra_nchw.py#L18 https://github.com/rayleizhu/BiFormer/blob/b0ccf7a65f02b406b776e8cf6b56501620349da2/ops/torch/rrsda.py#L96
Hello!I want to know what is the "gather" step for? and why the bra_nchw.py didn't had the "gather" step? I hope you can answer it for me! thank you!