sail-sg / MDT

Masked Diffusion Transformer is the SOTA for image synthesis. (ICCV 2023)
Apache License 2.0
500 stars 35 forks source link

loss m_mse of MDT-S-2 is much larger than mse and the visualization of MDT-S-2 with mask_ratio 0.3 does not work #8

Open ZGCTroy opened 1 year ago

ZGCTroy commented 1 year ago
image

From the loss of mse and m_mse, it seems that the mask branch does not work in MDT-S-2. We also visualize the generation image and find that generated image with mask_ratio=None is normal but the image with mask_ratio=0.3 is noise.

gasvn commented 1 year ago

This is expected, as we want to keep the standard diffusion process and thus make the side-interpolater very small. If you want to rely on the side-interpolater for inference, it's required to make the side-interpolater larger.

Lecxxx commented 10 months ago

@gasvn Hello! Thank you for your excellent work! How should I make the side-interpolater larger in MDT-S-2 correctly? Do you mean redesigning the side-interpolater to make its network layers larger? As mentioned in your paper,

“When the side-interpolater is kept during inference, MDT naturally enables the image inpainting ability.”

How to carry out specific design?