Closed EvilicLufas closed 4 months ago
We used 4x8xA100 40G to train the largest model. It takes about 24 hours to train about 200k steps for the largest model.
We used 4x8xA100 40G to train the largest model. It takes about 24 hours to train about 200k steps for the largest model.
Thanks for your great support!
We've been training on the S2 model and now want to switch to the XL model to train a data set of 50 million images. Before we do that, it will be appreciated to konw how long did it take you to train the MDTv2-XL on imagenet using 4 nodes, 8 gpus(80G) per node ? Just an approximate time will be helpful, thank you!