keyu-tian / SparK

[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"
https://arxiv.org/abs/2301.03580
MIT License
1.45k stars 84 forks source link

Single GPU #22

Closed hasayake007 closed 1 year ago

hasayake007 commented 1 year ago

Could it be trained on a single GPU?

keyu-tian commented 1 year ago

You can refer to https://github.com/keyu-tian/SparK/tree/main/pretrain#debug-on-1-gpu-without-distributeddataparallel.

But if pretrained from scratch, it may be difficult to achieve similar performance to our published results using only 1 GPU (those were pretrained for at least 1000 GPU hours)

So it is recommended to load our pretrained model weights and then pretrain them on your dataset for some more time, or just finetune them. You can see https://github.com/keyu-tian/SparK/tree/main/pretrain#tutorial-for-pretraining-your-own-dataset or #20 for how to pretrain on your dataset.