How to try other mask patch size like 16*16?

rangek commented 1 year ago

Hi, thanks for your nice work! It seems it only supports mask patch size of 32x32,if i want to try other size like 16x16/8x8,how should i change the code?

keyu-tian commented 1 year ago

Thanks! Before answering your question, I'd clarify one important thing: in SparK, the mask patch size equals to the downsample ratio of the CNN.

Here is the reason: when we do mask, we:

first generate the mask for the smallest resolution feature map, i.e., generate the _cur_active or active_b1ff in line86-87, which is shaped as [B, 1, fmap_size, fmap_size], and would be used to mask the smallest feature map.
then progressively upsample it (i.e., expand its 2nd and 3rd dimensions by calling repeat_interleave(..., 2) and repeat_interleave(..., 3) in line16), to mask those feature maps (x in line21) with larger resolutions .

So if you want a pathc size of 16 or 8, you should actually use a CNN model with a downsample ratio of 16 or 8. Note that the CNN should implement its forward function with an arg named hierarchy. You can look at https://github.com/keyu-tian/SparK/blob/main/pretrain/models/convnext.py#L78 to see what hierarchy means and how to handle it.

After that, I think you can simply run main.sh with --hierarchy=3 and see if it works. Also, according to SimMIM, I would suggest using a larger mask ratio, like 0.75. So you may also use --mask=0.75.

If you have any further question u can comment here again.

rangek commented 1 year ago

Thanks a lot! It will do me a great favor.

keyu-tian commented 1 year ago

@range997996 Hi, we've refactored the code, and for now we believe it can be very convenient for customizing CNN models. You can kindly read https://github.com/keyu-tian/SparK/tree/main/pretrain#tutorial-for-customizing-your-own-cnn-model for the tutorial, hope you enjoy it!

PS: in this code refactor, we remove the --hierarchy which could be hard to understand.

junwuzhang19 commented 1 year ago

Hi, thanks for your work. I am confused how S4 mask pattern transforms to S2 pattern in this figure, if masking on 32x32 with iteratively upsampling. Is it a mistake ( or I may not fully understand)?

keyu-tian commented 1 year ago

@junwuzhang19 Yes you are right. I violated the correctness slightly in this diagram, just for better visual effect. If patch 32 used, each white hole on S2 should strictly be 4x4, not 1x1 as shown.

keyu-tian / SparK

How to try other mask patch size like 16*16? #11