keyu-tian / SparK

[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"
https://arxiv.org/abs/2301.03580
MIT License
1.42k stars 82 forks source link

Where is the implementation of sparse convolution please? #1

Closed CoinCheung closed 1 year ago

keyu-tian commented 1 year ago

In this repo we simulate a submanifold sparse convolution (and pooling, norm, etc.) in encoder.py and recommend to use it for generality. We also use this sparse convolution library as a second choice, but it would be much slower on grouped conv/depth-wise conv.

More tips about sparse conv can be found in this readme.

CoinCheung commented 1 year ago

Hi,

I read that code file, you did not upload your implementation this time, did you?

keyu-tian commented 1 year ago

Lines 38 to 55 may look somewhat unfinished, but they are complete implementations. The sp_conv_forward and sp_bn_forward are defined in lines 19-35 and are used to override the member functions in lines 38-55.

For sparse conv and pooling, they share the sp_conv_forward, where the binary mask would be multiplied with the results of superclass's forward function (lines 20-21). And for sparse batch norm, the statistics are calculated only at non-empty positions, so line29 selects the features and line30 uses 1D BN to normalize these selected features.

We would add more comments on these codes.

CoinCheung commented 1 year ago

I seem to understand it after reading the code more carefully. So you first use normal convolution and then mask out the output feature points where they should not be active according to the definition of sparse_conv, am I correct about this?

By the way, some blog claims that there are two sorts of sparse-conv, first is called regular sparse-conv and the second is submanifold sparse-conv. The regular sparse-conv corresponds to the "zero-outing" ablation experiments in the paper, and the submanifold sparse-conv is what the paper uses. Am I correct about this?

CoinCheung commented 1 year ago

Also this implementation only supports sparse pattern of "32x32 patches"(which is enough for the experiments in this paper), and if we have an input tensor with irregular sparse pattern, this implementation would not give identical output as what submanifold sparse conv does?

keyu-tian commented 1 year ago

I seem to understand it after reading the code more carefully. So you first use normal convolution and then mask out the output feature points where they should not be active according to the definition of sparse_conv, am I correct about this?

By the way, some blog claims that there are two sorts of sparse-conv, first is called regular sparse-conv and the second is submanifold sparse-conv. The regular sparse-conv corresponds to the "zero-outing" ablation experiments in the paper, and the submanifold sparse-conv is what the paper uses. Am I correct about this?

Yes for both. And our implementation is the submanifold one, which would strictly mask out the convoluted feature at inactive positions.

keyu-tian commented 1 year ago

Also this implementation only supports sparse pattern of "32x32 patches"(which is enough for the experiments in this paper), and if we have an input tensor with irregular sparse pattern, this implementation would not give identical output as what submanifold sparse conv does?

This is true, it is only applicable to the specific sparse pattern here.

CoinCheung commented 1 year ago

Thanks for telling me these !!

I think this work is pretty meaningful. I would treat this work as a milestone, and I can imagine the scene that cnn fights back!

keyu-tian commented 1 year ago

Thanks! We believe the generative pre-training can unleash cnns' new potential. We shall see new lights on them!