training efficiency - Githubissues

hologerry / SoCo

[NeurIPS 2021 Spotlight] Aligning Pretraining for Detection via Object-Level Contrastive Learning

https://arxiv.org/abs/2106.02637

MIT License

172 stars 21 forks source link

training efficiency #10

Open dengandong opened 2 years ago

dengandong commented 2 years ago

hi, hologerry~

I'm currently run your code on 4 V100 32G. I found it took about 1.3s for each iteration (batch size =128/GPU), thus the total training time for 100 epochs is about 7 days.

Does 1.3s sounds normal for you? I ran MoCo on same machines, and it took about 0.5s/iteration.

I'd appreciate if you can help me with this~ Thanks!

hologerry commented 2 years ago

Since we use 16 GPUs (V100 32G) for batch size = 128 per GPU, which means the total batch size is 2048, it is 4 times bigger than your setting. For our setting, it takes us about 35 hours to train 100 epochs. Since we pre-train the backbone, FPN, and the RCNN-Head, which is much bigger than the MoCo training the backbone only, the speed is slower than MoCo as expected.