sail-sg / MDT

Masked Diffusion Transformer is the SOTA for image synthesis. (ICCV 2023)
Apache License 2.0
500 stars 35 forks source link

Suggested settings on training b/2, s/2 ? #39

Closed aaab8b closed 4 months ago

aaab8b commented 4 months ago

First of all, thank you for this wonderful work! I follow the original settings of xl/2 you provided to train mdtv2 b/2 and s/2 however the fid score calculated with imagenet 50k is much higher than the paper provided (for s/2 I got 58 fid score by using cfg when training 960k steps). Are there any settings changed?

gasvn commented 4 months ago

What is the exact training setting? xl/2 uses 4x8gpu to train, while b/2 and s/2 uses 8gpu. We also list the training setting for b/2 and s/2 in our readme.

aaab8b commented 4 months ago

What is the exact training setting? xl/2 uses 4x8gpu to train, while b/2 and s/2 uses 8gpu. We also list the training setting for b/2 and s/2 in our readme.

image look like the readme is different from the setting in your paper of N2 decoder blocks.

gasvn commented 4 months ago

Thanks for the reminder! Results from paper is the right setting. I will update the code

aaab8b commented 4 months ago

Thanks for the reminder! Results from paper is the right setting. I will update the code

still wants to know specific decoder_layer num for B/2, or can you guys provide a public pretrain b/2 model? thank you so much.