Training Efficiency - Githubissues

keyu-tian / SparK

[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"

https://arxiv.org/abs/2301.03580

MIT License

1.46k stars 84 forks source link

Training Efficiency #85

Open LM986 opened 2 months ago

LM986 commented 2 months ago

Hello! In the function sp_conv_forward, the input x first goes through a normal convolution forward, so why can the sp_conv_forward improve training efficiency compared to the normal conv_forward?