keyu-tian / SparK

[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"
https://arxiv.org/abs/2301.03580
MIT License
1.41k stars 82 forks source link

Increasing batch size #82

Open knightron0 opened 6 months ago

knightron0 commented 6 months ago

I'm trying to run pretraining with Resnet50 with my data, and running into out-of-memory issues with this.

Initially, I was using two V100s (32 GB) and the maximum batch size I could go to was 256. However, I can't go higher with even larger memory GPUs — I tried using an A100 both 40GB and 80GB, and the maximum batch size I could use without running into out-of-memory issues was still 256.

I'm a bit confused and was wondering if there's a knowledge gap in my understanding; let me know if I'm missing anything!

keyu-tian commented 4 months ago

hi @knightron0, if a batch size of 256 maxes out a 32GB V100, then a 40GB A100 should be similar. FYI: we use 32 x 80GB A100 in ResNet50 pretraining, with single batch size 128, and that was ok.