Closed ysig closed 3 years ago
This sounds correct for 16GB of RAM and this implementation. If you want to fit larger batches, you could try the --microbatch or --use-checkpoint arguments, or try fusing certain ops (e.g. swish+groupnorm) to save memory.
@ysig How's your training result? Does bs=4 significantly degrade the performance or cause the model to fail coverging? I have a similar hardward setting, so just curious if ddpm is still runnable in this case.
Hi I tried using your code on a custom dataset of images of 256x256. The machine I am using has 4 V100 of 16Gb each. I noticed that I was only able to train a model with a batch size equal to 4 (which basically means one image per gpu). For a batch-size bigger than 4 (8, 16, 32, ...) I get a CUDA out of memory error:
Am I doing something wrong or this is an expected behavior of your implementation?
I use the following config (LSUN):
Thank you,