sail-sg / MDT

Masked Diffusion Transformer is the SOTA for image synthesis. (ICCV 2023)
Apache License 2.0
500 stars 35 forks source link

Training time of MDTv2-XL on imagenet using 4x8 gpus #41

Closed EvilicLufas closed 4 months ago

EvilicLufas commented 4 months ago

We've been training on the S2 model and now want to switch to the XL model to train a data set of 50 million images. Before we do that, it will be appreciated to konw how long did it take you to train the MDTv2-XL on imagenet using 4 nodes, 8 gpus(80G) per node ? Just an approximate time will be helpful, thank you!

gasvn commented 4 months ago

We used 4x8xA100 40G to train the largest model. It takes about 24 hours to train about 200k steps for the largest model.

EvilicLufas commented 4 months ago

We used 4x8xA100 40G to train the largest model. It takes about 24 hours to train about 200k steps for the largest model.

Thanks for your great support!