microsoft / Focal-Transformer

[NeurIPS 2021 Spotlight] Official code for "Focal Self-attention for Local-Global Interactions in Vision Transformers"
MIT License
545 stars 59 forks source link

FocalTransformerV2 not using expand_size #23

Open Edouard99 opened 1 year ago

Edouard99 commented 1 year ago

The paper (and focalTv1) implements a window size of 7 with expand size of 3 to match a 13x13 zone, in your implementation in v2 expand_size is never used (even if it is declared), I believe that it is because you replaced it by topK closest position (which represent a zone of sqrt(128)xsqrt(128)) in your config. Am I right ? And you are then projecting the topK coordinates using a Linear layer, right ? Have you tested the model with this configuration ?