tyshiwo1 / DiM-DiffusionMamba

The official implementation of DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis
121 stars 6 forks source link

CUDA out of memory when training #6

Closed yyNoBug closed 1 month ago

yyNoBug commented 1 month ago

Dear authors! Thanks for your great work. I tried training a DiM model with your provided config configs/imagenet256_H_DiM.py, but it triggers the CUDA out of memory error. I am running the original code you have provided on 8 A100 GPUs. Do you have any idea what might be the issue?

tyshiwo1 commented 1 month ago

Have you installed DeepSpeed and use Zero-2? pip install deepspeed

yyNoBug commented 1 month ago

Thanks! 768 batch size is fine after I installed deepspeed.

By the way, the training now takes 3 seconds per step. Is that normal?

tyshiwo1 commented 1 month ago

Yes, 3s/iter is normal. We take more than 20 days to train for 625K iterations.

As for its speed on $256 × 256$ images, Mamba is inferior to Transformer due to its double number of scans. You can refer to Figure 3 in our paper for the details of speed.