Closed gaopengcuhk closed 4 years ago
Hi @gaopengcuhk! For a fixed visual backbone, pretraining time depends a lot on the size of transformer. To train across 8 2080 Ti GPUs for 500K iterations, training time can be 35-40 hours for a ResNet-50 visual backbone and (L = 1, H = 1024 or 2048)
textual head.
Can you share the approximate time for pretraining?