Closed jyq2066 closed 1 year ago
Hi Author, BiFormer inspires me a lot, I am searching for the routing attention where the ops contains two. What is the main routing attention you proposed in your paper. Thanks!
Thanks for your interest. Short answer: the two implementations are equivalent and you can check the comments here:
https://github.com/rayleizhu/BiFormer/blob/1697bbbeafb8680524898f1dcaac10defd0604be/ops/bra_nchw.py#L23.
You can also find more information in README:
Hi Author, BiFormer inspires me a lot, I am searching for the routing attention where the ops contains two. What is the main routing attention you proposed in your paper. Thanks!