Closed hasayake007 closed 1 year ago
You can refer to https://github.com/keyu-tian/SparK/tree/main/pretrain#debug-on-1-gpu-without-distributeddataparallel.
But if pretrained from scratch, it may be difficult to achieve similar performance to our published results using only 1 GPU (those were pretrained for at least 1000 GPU hours)
So it is recommended to load our pretrained model weights and then pretrain them on your dataset for some more time, or just finetune them. You can see https://github.com/keyu-tian/SparK/tree/main/pretrain#tutorial-for-pretraining-your-own-dataset or #20 for how to pretrain on your dataset.
Could it be trained on a single GPU?