training costs - Githubissues

River-Zhang commented 4 months ago

Thanks for your great work! It's amazing! It can be applied to many real-world scenarios. I wonder if this is a paper submitted to CVPR 2024? The format of the paper looks like it is.

Besides, could you please tell me how many GPUs it costs to train this model? Thanks very much!

xucao-42 commented 4 months ago

Great work and same concern here. I did not see descriptions of training resources in the paper. Could the authors provide details?

yocabon commented 4 months ago

Hi,

We trained dust3r on A100 gpus (with 80GB of vram) for the training at 224x224 resolution, we used 4, but I'd recommend to use 8 (it increases the effective batch size to 128 - we also tried that and there's barely any difference). 512 linear and dpt both were trained using 8 A100 gpus.

About timings:

224: ~0.59s per batch, 8*100_000 pairs per per epoch, that's 6_250 steps if running on 8 gpus, or ~1.02 hours per epoch. About 102.4 hours total (test passes/saving checkpoints will increase that a bit, but they are not long compared to training).
512 linear: ~0.63s per step (accum_iter=2, so 2 steps per effective batch), 8*10_000 pairs per per epoch, 2_500 steps per epoch, 26.25 minutes per epoch, 87.5 hours total 512 dpt: ~0.52s per step (accum_iter=4, so 4 steps per effective batch), 8*10_000 pairs per per epoch, 5_000 steps per epoch, 43 minutes per epoch, 65 hours total. Note, the first epoch is much slower when using dpt.

River-Zhang commented 4 months ago

Thanks very much!

naver / dust3r

training costs #2