Some problems with the reproduction process about focal block

Hello, first of all, thank you for providing the Focal Transformer module,

I have some questions:

The image you are processing is 224224 and window size = 7. I wonder if it is reasonable for me to change the window size to 8 in order to make it divisible when I input 512512 at the beginning.
Since you didn't give the segmentation code, IT seems to me that you set focal level as 2 in your demo. Should I set level values as 1, 2 and 3 respectively when processing? To find the best fit.
As for num_heads, I see that you are patch_embeding, and the number of channels becomes 96. Then in focal attention,num_heads = 2 in order to divide evenly.
Suppose that the size of the image I input is 3232, since patch_size = 4, the length of the vector is 64, then the size I input should increase successively, such as 6464, 128*128. Should I also increase ptach_size so that the final vector length is still 64?

If you can help me, I will be very grateful to you

microsoft / Focal-Transformer