rayleizhu / BiFormer

[CVPR 2023] Official code release of our paper "BiFormer: Vision Transformer with Bi-Level Routing Attention"
https://arxiv.org/abs/2303.08810
MIT License
487 stars 39 forks source link

About the "gather" #1

Closed xyh001007 closed 1 year ago

xyh001007 commented 1 year ago

Hello!I want to know what is the "gather" step for? and why the bra_nchw.py didn't had the "gather" step? I hope you can answer it for me! thank you!

rayleizhu commented 1 year ago
  1. We gather spatially scattered keys/values hence the computation can be efficiently done with dense matrix multiplication.

  2. See the codes below

https://github.com/rayleizhu/BiFormer/blob/b0ccf7a65f02b406b776e8cf6b56501620349da2/ops/bra_nchw.py#L18 https://github.com/rayleizhu/BiFormer/blob/b0ccf7a65f02b406b776e8cf6b56501620349da2/ops/torch/rrsda.py#L96