rayleizhu / BiFormer

[CVPR 2023] Official code release of our paper "BiFormer: Vision Transformer with Bi-Level Routing Attention"
https://arxiv.org/abs/2303.08810
MIT License
485 stars 39 forks source link

Diffirence in routing attention #19

Closed jyq2066 closed 1 year ago

jyq2066 commented 1 year ago

Hi Author, BiFormer inspires me a lot, I am searching for the routing attention where the ops contains two. What is the main routing attention you proposed in your paper. Thanks!

rayleizhu commented 1 year ago

Thanks for your interest. Short answer: the two implementations are equivalent and you can check the comments here:

https://github.com/rayleizhu/BiFormer/blob/1697bbbeafb8680524898f1dcaac10defd0604be/ops/bra_nchw.py#L23.

You can also find more information in README:

screenshot