kdexd / virtex

[CVPR 2021] VirTex: Learning Visual Representations from Textual Annotations
http://kdexd.xyz/virtex
MIT License
557 stars 61 forks source link

Approximate Pretraining time? #2

Closed gaopengcuhk closed 4 years ago

gaopengcuhk commented 4 years ago

Can you share the approximate time for pretraining?

kdexd commented 4 years ago

Hi @gaopengcuhk! For a fixed visual backbone, pretraining time depends a lot on the size of transformer. To train across 8 2080 Ti GPUs for 500K iterations, training time can be 35-40 hours for a ResNet-50 visual backbone and (L = 1, H = 1024 or 2048) textual head.