Closed yxchng closed 10 months ago
@yxchng basically you can refer to what we do on BatchNorm: https://github.com/keyu-tian/SparK/blob/main/pretrain/encoder.py#L26. Theoretically speaking, only unmasked areas should be taken into account when calculating the mean and std for the normalization layer. You may mimic writing a SparseGroupNorm
like our SparseBatchNorm2d
.
if not, how can i make it work for CNN with groupnorm?