Closed snehashis-roy closed 4 years ago
At a minimum, the amount of memory needed would be:
101 x 2160 x 2560 x (4 bytes for float32) x 3 (PSF, image, and intermediate results) = ~6.7G
The worst part of that in your case is certainly that the PSF is padded out to the size of the image. There are deconvolution implementations out there that work on different dimensions for each, but there was no clear way to do that with the tf.fft functions. Although, I've never seen a disparity between the two be even close to as large before -- how'd you end up with a 3x9x9 PSF? That seems curiously small (is this a 2x widefield image or something?).
Anyways, in my own experiments I would always estimate about 2x on top of that minimum estimate (I don't know the TF internals enough to say why) so I'm not at all surprised that it doesn't fit in 8G.
Thank you. The PSF is small because it is for a Lightsheet image with 5x5x5um resolution. Also is the first dimension number of slices? io.imread makes tiff images to numpy array with DxWxH. Is that the correct orientation (DxWxH) to feed the numpy array to fd_data?
You sure you're not getting HxW from imread (instead of WxH)? DxHxW is what users typically do, but the RL algorithm technically doesn't care.
You sure you're not getting HxW from imread (instead of WxH)? DxHxW is what users typically do, but the RL algorithm technically doesn't care.
Sorry, you are right, it is DxHxW. I want to make sure D is the first axis.
Ok cool, DxHxW is good then.
At a minimum, the amount of memory needed would be:
101_2160_2560 (4 bytes for float32) 3 (PSF, image, and intermediate results) = ~6.7G
The worst part of that in your case is certainly that the PSF is padded out to the size of the image. There are deconvolution implementations out there that work on different dimensions for each, but there was no clear way to do that with the tf.fft functions. Although, I've never seen a disparity between the two be even close to as large before -- how'd you end up with a 3x9x9 PSF? That seems curiously small (is this a 2x widefield image or something?).
Anyways, in my own experiments I would always estimate about 2x on top of that minimum estimate (I don't know the TF internals enough to say why) so I'm not at all surprised that it doesn't fit in 8G.
After lots of experiments, I realized that the total memory required will be 8 times the float32 memory of one input + some overhead. The input (and psf) is always padded to next power of 2.
So for an image of 1468x1663x17, the required memory was 2048x2048x32x4x8/(1024**3) = 4GB + some overhead. In this scenario, the overhead was 2453MB, so total memory (as shown in nvidia-smi) was 4096+2453 = 6549 MB
For an image of size 1819x2063x17, the required memory was 2048x4096x32x4x8/(1024**3) = 8GB + some overhead. The overhead was 2453MB, so total memory (as shown in nvidia-smi) was 8192+2453 = 10645MB
For an image of size 844x953x17, the required memory was 1024x1024x32x4x8/(1024**3) = 1GB + some overhead. The overhead was 917MB, so total memory (as shown in nvidia-smi) was 1024+917 = 1941MB
I did not have a good idea of how to compute the overhead, but over various image sizes, the required memory computation was consistent. Hopefully, this will help optimize chunking large images to utilize the full GPU memory.
Specs: tf-gpu 1.14.0, python 3.6.8
Hello. I am trying to deconvolve a 101x2160x2560 (DxWxH) image with a 3x9x9 (DxWxH) PSF. I have tf-gpu 1.14.0 installed, and have 11GB card.
I get segmentation fault. However with a smaller (101x1000x1000) image, the same PSF works with approximately 8815 MB gpu memory.
I have tried both with allow_growth = True and allow_growth = False. In both cases, I got segfault.
This is the segfault message with 101x2160x2560 image. Can you please tell me how to estimate GPU memory based on image size and PSF size, so that I can efficiently do chunking?