keyu-tian / SparK

[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"
https://arxiv.org/abs/2301.03580
MIT License
1.42k stars 82 forks source link

The time of training ConvNext-B #23

Closed kalelpark closed 1 year ago

kalelpark commented 1 year ago

Thanks for your work! I wondering GPU time for pre-training 1600 epochs using ConvNext-B.

keyu-tian commented 1 year ago

thanks, it takes around 4000 GPU hours (5 days on 32 Tesla A100s, batch size 4096)