GPU memory usage for denoising

Jason-vR commented 4 years ago

Hi guys,

I'm having trouble optimizing the memory usage on our P100 GPUs during denoising training.

On K3 (binned not super-resolution images), at 5760 x 4092 32 bit real, the file size is 90 MB. When using a crop size of 800 and a batch size of 10, I get the following error:

RuntimeError: CUDA out of memory. Tried to allocate 782.00 MiB (GPU 0; 15.90 GiB total capacity; 14.61 GiB already allocated; 579.88 MiB free; 47.48 MiB cached)

I would calculate around 730 MB required for 10 images with patches of 800 which agrees with the request in the error message but clearly the 16 GB GPU memory is maxed out. To get training to run, I have to use a crop size 7 and patch of 800 but this still uses close to 15.5 GB of memory.

What is using the remainder of the GPU memory if the request is only ~ 800 MB? The --preload option is not available in Topaz 0.2.2 so it cannot be this.

Any help would be appreciated.

Regards, Jason eBIC for Industry Diamond Light Source

P.S. Loving Topaz! Immensely powerful and fast for difficult projects.

tbepler commented 4 years ago

Thanks for using Topaz. I'm glad it's been useful for you!

You are correct that if we were only storing the 10 800x800 image crops on the GPU, it wouldn't require much GPU RAM (25 megabytes or so). However, all of the intermediate layers of the denoising model also need to be stored in GPU RAM, as well as the model parameters, and the gradients during training. This adds a significant multiplier on the GPU RAM usage. On my system, using the default training settings of minibatch size 4 and crop 800, training uses 12833MiB of (~13 GB) of GPU RAM. These parameters are set by default so that training will fit on a 16 GB RAM GPU, so changing to a minibatch size of 4 should fix the problem. If you want to use the larger batch size, then you'll need to reduce the crop size enough to compensate. A rough back of the envelope is that 4->10 increases the batch size by 2.5x, so you would need to reduce the crop size by the square root of 2.5 (~1.6) to 500 for a minibatch to fit in GPU RAM.

Hope that helps!

P.S. A few other comments:

1) The training crop size (-c/--crop) sets how large the images are that the model sees during training. I recommend keeping this as large as possible, because it allows the model to learn to use a large amount of context for better denoising. A crop size of 7 is definitely too small. I would discourage going below ~200 pixels.

2) The patch size (--patch-size) and padding arguments (--padding) are not used during training, only when applying a trained model. Patch size and crop do essentially the same thing except crop is used during training and patch size is used during prediction. This could definitely be better explained in the interface.

3) The --preload option only loads the training micrographs into RAM, not GPU RAM. This makes training faster if your whole dataset fits into RAM.

Jason-vR commented 4 years ago

Hi Tristan,

Thanks for the detailed explanation; I understand now. You can tell I am still living in a linear image processing world and only getting to grips with neural nets now.

Sorry, I've also been tripping over the new vocabulary too:

I said:

a crop size 7 and patch of 800

but meant batch 7 and crop 800.

I was principally worried that because I fell short of the default batch size of 10 (0.2.2) that I was compromising the training. If you use --batch-size 4 then the problem is solved for me.

Thanks again, Jason

tbepler commented 4 years ago

Yes, batch size of 4 works perfectly fine. I forgot I had the batch size to 10 in the older versions. I changed it to 4 for exactly this GPU RAM reason.

I'm going to close this issue since it sounds like the question is resolved. Feel free to reopen/start a new issue if you run into any other issues!

tbepler / topaz

GPU memory usage for denoising #50