Estimation of required memory

snehashis-roy commented 4 years ago

Hello. I am trying to deconvolve a 101x2160x2560 (DxWxH) image with a 3x9x9 (DxWxH) PSF. I have tf-gpu 1.14.0 installed, and have 11GB card.

import tensorflow as tf
from flowdec import data as fd_data
from flowdec import restoration as fd_restoration
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
os.environ['CUDA_VISIBLE_DEVICES'] = '1'
from skimage.external import tifffile as tif
from skimage.io import imsave, imread

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)

imname = 'new.tif'
actual = imread(imname)
actual = np.asarray(actual, dtype=np.float32)
print(actual.shape)

kname = 'psf1.tif'
kernel = imread(kname)
kernel = np.asarray(kernel,dtype=np.float32)
kernel = np.transpose(kernel,(2,1,0))
print(kernel.shape)

t1 = time()
print('Initializing')
algo = fd_restoration.RichardsonLucyDeconvolver(actual.ndim, pad_mode='none').initialize()
print('Running')
res = algo.run(fd_data.Acquisition(data=actual, kernel=kernel), niter=15).data
res = np.asarray(res,dtype=np.uint16)

t2 = time()
print(t2-t1)
print(res.shape)
imsave('decon_result.tif', res)

I get segmentation fault. However with a smaller (101x1000x1000) image, the same PSF works with approximately 8815 MB gpu memory.

I have tried both with allow_growth = True and allow_growth = False. In both cases, I got segfault.

This is the segfault message with 101x2160x2560 image. Can you please tell me how to estimate GPU memory based on image size and PSF size, so that I can efficiently do chunking?

[roys5@mh02112259dt tmp]$ python flowdec_example.py
WARNING:tensorflow:From flowdec_example.py:20: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From flowdec_example.py:22: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2020-06-07 17:48:13.460340: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-06-07 17:48:13.468494: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2020-06-07 17:48:13.685218: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5585f3cc4d80 executing computations on platform CUDA. Devices:
2020-06-07 17:48:13.685291: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
2020-06-07 17:48:13.690803: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2294660000 Hz
2020-06-07 17:48:13.696609: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5585f3f2ed20 executing computations on platform Host. Devices:
2020-06-07 17:48:13.696668: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2020-06-07 17:48:13.698584: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:04:00.0
2020-06-07 17:48:13.699119: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2020-06-07 17:48:13.702031: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2020-06-07 17:48:13.704691: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2020-06-07 17:48:13.705260: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2020-06-07 17:48:13.708751: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2020-06-07 17:48:13.711392: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2020-06-07 17:48:13.718264: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2020-06-07 17:48:13.721262: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2020-06-07 17:48:13.721354: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2020-06-07 17:48:13.724138: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-07 17:48:13.724189: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0
2020-06-07 17:48:13.724220: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N
2020-06-07 17:48:13.727270: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10073 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:04:00.0, compute capability: 7.5)
(101, 2160, 2560)
(3, 9, 9)
Initializing
WARNING:tensorflow:From /home/roys5/miniconda3/lib/python3.6/site-packages/flowdec-1.1.0-py3.6.egg/flowdec/fft_utils_tf.py:75: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Running
2020-06-07 17:48:36.875671: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:04:00.0
2020-06-07 17:48:36.875773: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2020-06-07 17:48:36.875802: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2020-06-07 17:48:36.875870: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2020-06-07 17:48:36.875897: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2020-06-07 17:48:36.875922: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2020-06-07 17:48:36.875947: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2020-06-07 17:48:36.875988: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2020-06-07 17:48:36.877502: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2020-06-07 17:48:36.878919: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:04:00.0
2020-06-07 17:48:36.878983: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2020-06-07 17:48:36.879029: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2020-06-07 17:48:36.879051: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2020-06-07 17:48:36.879072: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2020-06-07 17:48:36.879106: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2020-06-07 17:48:36.879127: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2020-06-07 17:48:36.879148: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2020-06-07 17:48:36.880641: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2020-06-07 17:48:36.880699: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-07 17:48:36.880713: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0
2020-06-07 17:48:36.880723: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N
2020-06-07 17:48:36.882393: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10073 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:04:00.0, compute capability: 7.5)
2020-06-07 17:48:38.418516: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
2020-06-07 17:48:49.798603: W tensorflow/core/common_runtime/bfc_allocator.cc:314] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.08GiB (rounded to 2233958400).  Current allocation summary follows.
2020-06-07 17:48:49.798712: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (256):   Total Chunks: 9, Chunks in use: 9. 2.2KiB allocated for chunks. 2.2KiB in use in bin. 44B client-requested in use in bin.
2020-06-07 17:48:49.798796: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (512):   Total Chunks: 1, Chunks in use: 0. 512B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:49.798927: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (1024):  Total Chunks: 2, Chunks in use: 2. 2.2KiB allocated for chunks. 2.2KiB in use in bin. 2.0KiB client-requested in use in bin.
2020-06-07 17:48:49.798970: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (2048):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:49.799004: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (4096):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:49.799057: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (8192):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:49.799090: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (16384):         Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:49.799127: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (32768):         Total Chunks: 1, Chunks in use: 1. 62.5KiB allocated for chunks. 62.5KiB in use in bin. 62.5KiB client-requested in use in bin.
2020-06-07 17:48:49.799163: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (65536):         Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:49.799213: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (131072):        Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:49.799247: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (262144):        Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:49.799282: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (524288):        Total Chunks: 1, Chunks in use: 0. 956.5KiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:49.799314: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (1048576):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:49.799366: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (2097152):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:49.799399: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (4194304):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:49.799431: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (8388608):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:49.799482: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (16777216):      Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:49.799516: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (33554432):      Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:49.799549: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (67108864):      Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:49.799581: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (134217728):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:49.799617: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (268435456):     Total Chunks: 4, Chunks in use: 2. 8.00GiB allocated for chunks. 4.16GiB in use in bin. 4.16GiB client-requested in use in bin.
2020-06-07 17:48:49.799652: I tensorflow/core/common_runtime/bfc_allocator.cc:780] Bin for 2.08GiB was 256.00MiB, Chunk State:
2020-06-07 17:48:49.799694: I tensorflow/core/common_runtime/bfc_allocator.cc:786]   Size: 1.92GiB | Requested Size: 4B | in_use: 0 | bin_num: 20, prev:   Size: 2.08GiB | Requested Size: 2.08GiB | in_use: 1 | bin_num: -1
2020-06-07 17:48:49.799727: I tensorflow/core/common_runtime/bfc_allocator.cc:786]   Size: 1.92GiB | Requested Size: 0B | in_use: 0 | bin_num: 20, prev:   Size: 2.08GiB | Requested Size: 2.08GiB | in_use: 1 | bin_num: -1
2020-06-07 17:48:49.799751: I tensorflow/core/common_runtime/bfc_allocator.cc:793] Next region of size 4294967296
2020-06-07 17:48:49.799781: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7fd532000000 next 17 of size 2233958400
2020-06-07 17:48:49.799806: I tensorflow/core/common_runtime/bfc_allocator.cc:800] Free  at 0x7fd5b7278000 next 18446744073709551615 of size 2061008896
2020-06-07 17:48:49.799828: I tensorflow/core/common_runtime/bfc_allocator.cc:793] Next region of size 4294967296
2020-06-07 17:48:49.799861: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7fd632000000 next 14 of size 2233958400
2020-06-07 17:48:49.799886: I tensorflow/core/common_runtime/bfc_allocator.cc:800] Free  at 0x7fd6b7278000 next 18446744073709551615 of size 2061008896
2020-06-07 17:48:49.799908: I tensorflow/core/common_runtime/bfc_allocator.cc:793] Next region of size 1048576
2020-06-07 17:48:49.799931: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7fdb59000000 next 1 of size 256
2020-06-07 17:48:49.799953: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7fdb59000100 next 2 of size 256
2020-06-07 17:48:49.799976: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7fdb59000200 next 3 of size 256
2020-06-07 17:48:49.799998: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7fdb59000300 next 4 of size 256
2020-06-07 17:48:49.800038: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7fdb59000400 next 5 of size 256
2020-06-07 17:48:49.800063: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7fdb59000500 next 6 of size 64000
2020-06-07 17:48:49.800085: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7fdb5900ff00 next 7 of size 256
2020-06-07 17:48:49.800107: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7fdb59010000 next 8 of size 256
2020-06-07 17:48:49.800129: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7fdb59010100 next 9 of size 256
2020-06-07 17:48:49.800152: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7fdb59010200 next 10 of size 256
2020-06-07 17:48:49.800174: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7fdb59010300 next 11 of size 1280
2020-06-07 17:48:49.800197: I tensorflow/core/common_runtime/bfc_allocator.cc:800] Free  at 0x7fdb59010800 next 15 of size 512
2020-06-07 17:48:49.800220: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7fdb59010a00 next 16 of size 1024
2020-06-07 17:48:49.800245: I tensorflow/core/common_runtime/bfc_allocator.cc:800] Free  at 0x7fdb59010e00 next 18446744073709551615 of size 979456
2020-06-07 17:48:49.800267: I tensorflow/core/common_runtime/bfc_allocator.cc:809]      Summary of in-use Chunks by size:
2020-06-07 17:48:49.800300: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 9 Chunks of size 256 totalling 2.2KiB
2020-06-07 17:48:49.800326: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 1024 totalling 1.0KiB
2020-06-07 17:48:49.800351: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 1280 totalling 1.2KiB
2020-06-07 17:48:49.800376: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 64000 totalling 62.5KiB
2020-06-07 17:48:49.800401: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 2 Chunks of size 2233958400 totalling 4.16GiB
2020-06-07 17:48:49.800425: I tensorflow/core/common_runtime/bfc_allocator.cc:816] Sum Total of in-use chunks: 4.16GiB
2020-06-07 17:48:49.800454: I tensorflow/core/common_runtime/bfc_allocator.cc:818] total_region_allocated_bytes_: 8590983168 memory_limit_: 10562958132 available bytes: 1971974964 curr_region_allocation_bytes_: 8589934592
2020-06-07 17:48:49.800485: I tensorflow/core/common_runtime/bfc_allocator.cc:824] Stats:
Limit:                 10562958132
InUse:                  4467985408
MaxInUse:               4467985408
NumAllocs:                      25
MaxAllocSize:           2233958400

2020-06-07 17:48:49.800552: W tensorflow/core/common_runtime/bfc_allocator.cc:319] ***************************______________________***************************_______________________*
2020-06-07 17:48:49.800620: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at constant_op.cc:172 : Resource exhausted: OOM when allocating tensor with shape[101,2160,2560] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
2020-06-07 17:48:59.801010: W tensorflow/core/common_runtime/bfc_allocator.cc:314] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.08GiB (rounded to 2233958400).  Current allocation summary follows.
2020-06-07 17:48:59.801088: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (256):   Total Chunks: 9, Chunks in use: 9. 2.2KiB allocated for chunks. 2.2KiB in use in bin. 44B client-requested in use in bin.
2020-06-07 17:48:59.801157: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (512):   Total Chunks: 1, Chunks in use: 0. 512B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:59.801193: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (1024):  Total Chunks: 2, Chunks in use: 2. 2.2KiB allocated for chunks. 2.2KiB in use in bin. 2.0KiB client-requested in use in bin.
2020-06-07 17:48:59.801227: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (2048):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:59.801260: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (4096):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:59.801292: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (8192):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:59.801343: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (16384):         Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:59.801381: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (32768):         Total Chunks: 1, Chunks in use: 1. 62.5KiB allocated for chunks. 62.5KiB in use in bin. 62.5KiB client-requested in use in bin.
2020-06-07 17:48:59.801413: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (65536):         Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:59.801463: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (131072):        Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:59.801496: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (262144):        Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:59.801530: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (524288):        Total Chunks: 1, Chunks in use: 0. 956.5KiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:59.801563: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (1048576):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:59.801597: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (2097152):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:59.801630: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (4194304):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:59.801662: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (8388608):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:59.801713: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (16777216):      Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:59.801745: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (33554432):      Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:59.801777: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (67108864):      Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:59.801809: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (134217728):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-07 17:48:59.801871: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (268435456):     Total Chunks: 4, Chunks in use: 2. 8.00GiB allocated for chunks. 4.16GiB in use in bin. 4.16GiB client-requested in use in bin.
2020-06-07 17:48:59.801910: I tensorflow/core/common_runtime/bfc_allocator.cc:780] Bin for 2.08GiB was 256.00MiB, Chunk State:
2020-06-07 17:48:59.801945: I tensorflow/core/common_runtime/bfc_allocator.cc:786]   Size: 1.92GiB | Requested Size: 4B | in_use: 0 | bin_num: 20, prev:   Size: 2.08GiB | Requested Size: 2.08GiB | in_use: 1 | bin_num: -1
2020-06-07 17:48:59.801977: I tensorflow/core/common_runtime/bfc_allocator.cc:786]   Size: 1.92GiB | Requested Size: 0B | in_use: 0 | bin_num: 20, prev:   Size: 2.08GiB | Requested Size: 2.08GiB | in_use: 1 | bin_num: -1
2020-06-07 17:48:59.802022: I tensorflow/core/common_runtime/bfc_allocator.cc:793] Next region of size 4294967296
2020-06-07 17:48:59.802049: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7fd532000000 next 17 of size 2233958400
2020-06-07 17:48:59.802073: I tensorflow/core/common_runtime/bfc_allocator.cc:800] Free  at 0x7fd5b7278000 next 18446744073709551615 of size 2061008896
2020-06-07 17:48:59.802096: I tensorflow/core/common_runtime/bfc_allocator.cc:793] Next region of size 4294967296
2020-06-07 17:48:59.802118: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7fd632000000 next 14 of size 2233958400
2020-06-07 17:48:59.802141: I tensorflow/core/common_runtime/bfc_allocator.cc:800] Free  at 0x7fd6b7278000 next 18446744073709551615 of size 2061008896
2020-06-07 17:48:59.802163: I tensorflow/core/common_runtime/bfc_allocator.cc:793] Next region of size 1048576
2020-06-07 17:48:59.802186: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7fdb59000000 next 1 of size 256
2020-06-07 17:48:59.802208: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7fdb59000100 next 2 of size 256
2020-06-07 17:48:59.802230: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7fdb59000200 next 3 of size 256
2020-06-07 17:48:59.802273: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7fdb59000300 next 4 of size 256
2020-06-07 17:48:59.802295: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7fdb59000400 next 5 of size 256
2020-06-07 17:48:59.802318: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7fdb59000500 next 6 of size 64000
2020-06-07 17:48:59.802340: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7fdb5900ff00 next 7 of size 256
2020-06-07 17:48:59.802362: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7fdb59010000 next 8 of size 256
2020-06-07 17:48:59.802383: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7fdb59010100 next 9 of size 256
2020-06-07 17:48:59.802405: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7fdb59010200 next 10 of size 256
2020-06-07 17:48:59.802427: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7fdb59010300 next 11 of size 1280
2020-06-07 17:48:59.802449: I tensorflow/core/common_runtime/bfc_allocator.cc:800] Free  at 0x7fdb59010800 next 15 of size 512
2020-06-07 17:48:59.802471: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7fdb59010a00 next 16 of size 1024
2020-06-07 17:48:59.802494: I tensorflow/core/common_runtime/bfc_allocator.cc:800] Free  at 0x7fdb59010e00 next 18446744073709551615 of size 979456
2020-06-07 17:48:59.802520: I tensorflow/core/common_runtime/bfc_allocator.cc:809]      Summary of in-use Chunks by size:
2020-06-07 17:48:59.802546: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 9 Chunks of size 256 totalling 2.2KiB
2020-06-07 17:48:59.802571: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 1024 totalling 1.0KiB
2020-06-07 17:48:59.802595: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 1280 totalling 1.2KiB
2020-06-07 17:48:59.802620: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 64000 totalling 62.5KiB
2020-06-07 17:48:59.802644: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 2 Chunks of size 2233958400 totalling 4.16GiB
2020-06-07 17:48:59.802668: I tensorflow/core/common_runtime/bfc_allocator.cc:816] Sum Total of in-use chunks: 4.16GiB
2020-06-07 17:48:59.802696: I tensorflow/core/common_runtime/bfc_allocator.cc:818] total_region_allocated_bytes_: 8590983168 memory_limit_: 10562958132 available bytes: 1971974964 curr_region_allocation_bytes_: 8589934592
2020-06-07 17:48:59.802725: I tensorflow/core/common_runtime/bfc_allocator.cc:824] Stats:
Limit:                 10562958132
InUse:                  4467985408
MaxInUse:               4467985408
NumAllocs:                      25
MaxAllocSize:           2233958400

2020-06-07 17:48:59.802801: W tensorflow/core/common_runtime/bfc_allocator.cc:319] ***************************______________________***************************_______________________*
2020-06-07 17:48:59.802853: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at pad_op.cc:137 : Resource exhausted: OOM when allocating tensor with shape[101,2160,2560] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Segmentation fault (core dumped)

eric-czech commented 4 years ago

At a minimum, the amount of memory needed would be:

101 x 2160 x 2560 x (4 bytes for float32) x 3 (PSF, image, and intermediate results) = ~6.7G

The worst part of that in your case is certainly that the PSF is padded out to the size of the image. There are deconvolution implementations out there that work on different dimensions for each, but there was no clear way to do that with the tf.fft functions. Although, I've never seen a disparity between the two be even close to as large before -- how'd you end up with a 3x9x9 PSF? That seems curiously small (is this a 2x widefield image or something?).

Anyways, in my own experiments I would always estimate about 2x on top of that minimum estimate (I don't know the TF internals enough to say why) so I'm not at all surprised that it doesn't fit in 8G.

snehashis-roy commented 4 years ago

Thank you. The PSF is small because it is for a Lightsheet image with 5x5x5um resolution. Also is the first dimension number of slices? io.imread makes tiff images to numpy array with DxWxH. Is that the correct orientation (DxWxH) to feed the numpy array to fd_data?

eric-czech commented 4 years ago

You sure you're not getting HxW from imread (instead of WxH)? DxHxW is what users typically do, but the RL algorithm technically doesn't care.

snehashis-roy commented 4 years ago

You sure you're not getting HxW from imread (instead of WxH)? DxHxW is what users typically do, but the RL algorithm technically doesn't care.

Sorry, you are right, it is DxHxW. I want to make sure D is the first axis.

eric-czech commented 4 years ago

Ok cool, DxHxW is good then.

snehashis-roy commented 4 years ago

At a minimum, the amount of memory needed would be:

101_2160_2560 (4 bytes for float32) 3 (PSF, image, and intermediate results) = ~6.7G

The worst part of that in your case is certainly that the PSF is padded out to the size of the image. There are deconvolution implementations out there that work on different dimensions for each, but there was no clear way to do that with the tf.fft functions. Although, I've never seen a disparity between the two be even close to as large before -- how'd you end up with a 3x9x9 PSF? That seems curiously small (is this a 2x widefield image or something?).

Anyways, in my own experiments I would always estimate about 2x on top of that minimum estimate (I don't know the TF internals enough to say why) so I'm not at all surprised that it doesn't fit in 8G.

After lots of experiments, I realized that the total memory required will be 8 times the float32 memory of one input + some overhead. The input (and psf) is always padded to next power of 2.

So for an image of 1468x1663x17, the required memory was 2048x2048x32x4x8/(1024**3) = 4GB + some overhead. In this scenario, the overhead was 2453MB, so total memory (as shown in nvidia-smi) was 4096+2453 = 6549 MB

For an image of size 1819x2063x17, the required memory was 2048x4096x32x4x8/(1024**3) = 8GB + some overhead. The overhead was 2453MB, so total memory (as shown in nvidia-smi) was 8192+2453 = 10645MB

For an image of size 844x953x17, the required memory was 1024x1024x32x4x8/(1024**3) = 1GB + some overhead. The overhead was 917MB, so total memory (as shown in nvidia-smi) was 1024+917 = 1941MB

I did not have a good idea of how to compute the overhead, but over various image sizes, the required memory computation was consistent. Hopefully, this will help optimize chunking large images to utilize the full GPU memory.

Specs: tf-gpu 1.14.0, python 3.6.8

hammerlab / flowdec

Estimation of required memory #32