CUDA out of memory error for batch-size > 4 on V100

ysig commented 3 years ago

Hi I tried using your code on a custom dataset of images of 256x256. The machine I am using has 4 V100 of 16Gb each. I noticed that I was only able to train a model with a batch size equal to 4 (which basically means one image per gpu). For a batch-size bigger than 4 (8, 16, 32, ...) I get a CUDA out of memory error:

Am I doing something wrong or this is an expected behavior of your implementation?

I use the following config (LSUN):

MODEL_FLAGS="--image_size 256 --num_channels 128 --num_res_blocks 2 --num_heads 1 --learn_sigma True --use_scale_shift_norm False --attention_resolutions 16"
DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule linear --rescale_learned_sigmas False --rescale_timesteps False"
TRAIN_FLAGS="--lr 1e-4 --batch_size 4"
mpiexec -n 4 python scripts/image_train.py --data_dir /home/ubuntu/custom-data $MODEL_FLAGS $DIFFUSION_FLAGS $TRAIN_FLAGS

Thank you,

unixpickle commented 3 years ago

This sounds correct for 16GB of RAM and this implementation. If you want to fit larger batches, you could try the --microbatch or --use-checkpoint arguments, or try fusing certain ops (e.g. swish+groupnorm) to save memory.

Jiyang-Zheng commented 1 year ago

@ysig How's your training result? Does bs=4 significantly degrade the performance or cause the model to fail coverging? I have a similar hardward setting, so just curious if ddpm is still runnable in this case.

openai / improved-diffusion

CUDA out of memory error for batch-size > 4 on V100 #3