keyu-tian / SparK

[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"
https://arxiv.org/abs/2301.03580
MIT License
1.41k stars 82 forks source link

does this code work with CNN with groupnorm? #56

Closed yxchng closed 10 months ago

yxchng commented 11 months ago

if not, how can i make it work for CNN with groupnorm?

keyu-tian commented 11 months ago

@yxchng basically you can refer to what we do on BatchNorm: https://github.com/keyu-tian/SparK/blob/main/pretrain/encoder.py#L26. Theoretically speaking, only unmasked areas should be taken into account when calculating the mean and std for the normalization layer. You may mimic writing a SparseGroupNorm like our SparseBatchNorm2d.