rayleizhu / BiFormer

[CVPR 2023] Official code release of our paper "BiFormer: Vision Transformer with Bi-Level Routing Attention"
https://arxiv.org/abs/2303.08810
MIT License
460 stars 36 forks source link

A question about local context enhancement (LCE). #30

Closed stella-von closed 11 months ago

stella-von commented 11 months ago

May I ask how much improvement the LCE used in the article has brought?

rayleizhu commented 11 months ago

The short answer is, I do not know. I did not do this ablation experiment after the architecture is finalized.

LCE is added to our architecture during our early exploration period. If I remember correctly, At that time, we used DWConv in stages 1 & 2, BRA in stages 3 & 4, the improvement brought by LCE is 0.2 or so. I guess it plays the role of position encoding.

stella-von commented 11 months ago

Thanks.