microsoft / Focal-Transformer

[NeurIPS 2021 Spotlight] Official code for "Focal Self-attention for Local-Global Interactions in Vision Transformers"
MIT License
546 stars 59 forks source link

Some problems with the reproduction process about focal block #10

Open kimjisoo12 opened 2 years ago

kimjisoo12 commented 2 years ago

Hello, first of all, thank you for providing the Focal Transformer module,

I have some questions:

  1. The image you are processing is 224224 and window size = 7. I wonder if it is reasonable for me to change the window size to 8 in order to make it divisible when I input 512512 at the beginning.

  2. Since you didn't give the segmentation code, IT seems to me that you set focal level as 2 in your demo. Should I set level values as 1, 2 and 3 respectively when processing? To find the best fit.

  3. As for num_heads, I see that you are patch_embeding, and the number of channels becomes 96. Then in focal attention,num_heads = 2 in order to divide evenly.

  4. Suppose that the size of the image I input is 3232, since patch_size = 4, the length of the vector is 64, then the size I input should increase successively, such as 6464, 128*128. Should I also increase ptach_size so that the final vector length is still 64?

If you can help me, I will be very grateful to you