Closed liuxingbaoyu closed 4 years ago
You may try to use limit gpu memory growth parameter by putting following snippet on top of your code. If using TF 2.X
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpus[0], True)
For TF 1.X
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
You may try to use limit gpu memory growth parameter by putting following snippet on top of your code. If using TF 2.X
import tensorflow as tf gpus = tf.config.experimental.list_physical_devices('GPU') tf.config.experimental.set_memory_growth(gpus[0], True)
For TF 1.X
import tensorflow as tf config = tf.ConfigProto() config.gpu_options.allow_growth = True sess = tf.Session(config=config)
Thanks, I tried
tf.config.experimental.set_memory_growth (tf.config.experimental.list_physical_devices ('GPU') [0], True)
but it still gives an error, and the same error is reported in versions other than GPU
When I use the cpu version, it will not crash, but it will fill up all my memory(more than 30gb) and get stuck
Sorry, it's because x_train = tf.image.resize (x_train, input_shape [: 2]) x_train = tf.image.grayscale_to_rgb (x_train) takes up too much memory, thank you very much
I'm having the same error. How did you solve it?
I'm having the same error. How did you solve it?
Reduce the number of samples in memory.
Umm Can you show me a code?
You may try to use limit gpu memory growth parameter by putting following snippet on top of your code. If using TF 2.X
import tensorflow as tf gpus = tf.config.experimental.list_physical_devices('GPU') tf.config.experimental.set_memory_growth(gpus[0], True)
For TF 1.X
import tensorflow as tf config = tf.ConfigProto() config.gpu_options.allow_growth = True sess = tf.Session(config=config)
no it doesnt rezolve
I'm using Tensorflow 2.4.x My notebook has a NVIDIA GForce 920M (2GB RAM) and I tried to use set_memory_growth, but it doesn't worked. And I tried to limit memory to 1GB, also doesn't worked. So I limited memory utilization to 1.5GB and it worked.
def limitgpu(maxmem):
gpus = tf.config.list_physical_devices('GPU')
if gpus:
# Restrict TensorFlow to only allocate a fraction of GPU memory
try:
for gpu in gpus:
tf.config.experimental.set_virtual_device_configuration(gpu,
[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=maxmem)])
except RuntimeError as e:
# Virtual devices must be set before GPUs have been initialized
print(e)
# 1.5GB
limitgpu(1024+512)
What worked for me (with TF 2.4) was changing the data loading of the tf.data.Dataset
. Specifically, I switched from using from_tensor_slices
to using from_generator
. I am tackling semantic segmentation with volumes of shape 64x64x64. Here's some pseudo code:
input_volumes_list = [...] # list containing the input volumes that have shape 64x64x64
input_masks_list = [...] # list containing the corresponding segmentation masks also of shape 64x64x64
# define generator function
def generator_images_and_masks():
for idx in range(len(input_volumes_list)):
# extract one image and the corresponding mask
img = input_volumes_list[idx]
mask = input_masks_list[idx]
# convert to TF tensors
img_tensor = tf.convert_to_tensor(img, dtype=tf.float32)
mask_tensor = tf.convert_to_tensor(mask, dtype=tf.float32)
yield img_tensor, mask_tensor
# create dataset using generator function and specifying shapes and dtypes
dataset = tf.data.Dataset.from_generator(generator_images_and_masks,
output_signature=(tf.TensorSpec(shape=(64, 64, 64), dtype=tf.float32),
tf.TensorSpec(shape=(64, 64, 64), dtype=tf.float32)))
What worked for me (with TF 2.4) was changing the data loading of the
tf.data.Dataset
. Specifically, I switched from usingfrom_tensor_slices
to usingfrom_generator
. I am tackling semantic segmentation with volumes of shape 64x64x64. Here's some pseudo code:input_volumes_list = [...] # list containing the input volumes that have shape 64x64x64 input_masks_list = [...] # list containing the corresponding segmentation masks also of shape 64x64x64 # define generator function def generator_images_and_masks(): for idx in range(len(input_volumes_list)): # extract one image and the corresponding mask img = input_volumes_list[idx] mask = input_masks_list[idx] # convert to TF tensors img_tensor = tf.convert_to_tensor(img, dtype=tf.float32) mask_tensor = tf.convert_to_tensor(mask, dtype=tf.float32) yield img_tensor, mask_tensor # create dataset using generator function and specifying shapes and dtypes dataset = tf.data.Dataset.from_generator(generator_images_and_masks, output_signature=(tf.TensorSpec(shape=(64, 64, 64), dtype=tf.float32), tf.TensorSpec(shape=(64, 64, 64), dtype=tf.float32)))
is "input_volumes_list" a list to the paths to your volumes? or are they already read?
It will take up more than 30gb of memory, happening in tensorflow, tensorflow-gpu, tf-nightly
Code:
`2019-12-19 22:41:47.467474: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll 2019-12-19 22:41:52.813348: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll 2019-12-19 22:41:52.851093: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.755 pciBusID: 0000:41:00.0 2019-12-19 22:41:52.851257: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check. 2019-12-19 22:41:52.851712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0 2019-12-19 22:41:53.319561: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 2019-12-19 22:41:53.323650: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.755 pciBusID: 0000:41:00.0 2019-12-19 22:41:53.323795: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check. 2019-12-19 22:41:53.324400: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0 2019-12-19 22:41:53.989669: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-12-19 22:41:53.989780: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0 2019-12-19 22:41:53.989838: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N 2019-12-19 22:41:53.990709: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9530 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:41:00.0, compute capability: 7.5) 2019-12-19 22:41:54.080141: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 12042240000 exceeds 10% of system memory. 2019-12-19 22:42:15.504057: W tensorflow/core/common_runtime/bfc_allocator.cc:419] Allocator (GPU_0_bfc) ran out of memory trying to allocate 11.21GiB (rounded to 12042240000). Current allocation summary follows. 2019-12-19 22:42:15.504241: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (256): Total Chunks: 3, Chunks in use: 3. 768B allocated for chunks. 768B in use in bin. 48B client-requested in use in bin. 2019-12-19 22:42:15.504381: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (512): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-12-19 22:42:15.504525: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (1024): Total Chunks: 1, Chunks in use: 1. 1.3KiB allocated for chunks. 1.3KiB in use in bin. 1.0KiB client-requested in use in bin. 2019-12-19 22:42:15.504675: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (2048): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-12-19 22:42:15.504821: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (4096): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-12-19 22:42:15.504964: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (8192): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-12-19 22:42:15.505108: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (16384): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-12-19 22:42:15.505252: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (32768): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-12-19 22:42:15.505421: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (65536): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-12-19 22:42:15.505631: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (131072): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-12-19 22:42:15.505849: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (262144): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-12-19 22:42:15.506074: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (524288): Total Chunks: 1, Chunks in use: 0. 1022.0KiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-12-19 22:42:15.506468: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (1048576): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-12-19 22:42:15.506957: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (2097152): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-12-19 22:42:15.507273: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (4194304): Total Chunks: 1, Chunks in use: 1. 5.74MiB allocated for chunks. 5.74MiB in use in bin. 5.74MiB client-requested in use in bin. 2019-12-19 22:42:15.507582: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (8388608): Total Chunks: 3, Chunks in use: 0. 26.26MiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-12-19 22:42:15.507965: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (16777216): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-12-19 22:42:15.508284: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (33554432): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-12-19 22:42:15.508723: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (67108864): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-12-19 22:42:15.520336: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (134217728): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-12-19 22:42:15.520521: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (268435456): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-12-19 22:42:15.520749: I tensorflow/core/common_runtime/bfc_allocator.cc:885] Bin for 11.21GiB was 256.00MiB, Chunk State: 2019-12-19 22:42:15.521083: I tensorflow/core/common_runtime/bfc_allocator.cc:898] Next region of size 1048576 2019-12-19 22:42:15.521203: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 000000020FC00000 next 1 of size 1280 2019-12-19 22:42:15.521394: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 000000020FC00500 next 4 of size 256 2019-12-19 22:42:15.521591: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 000000020FC00600 next 7 of size 256 2019-12-19 22:42:15.521787: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 000000020FC00700 next 8 of size 256 2019-12-19 22:42:15.521961: I tensorflow/core/common_runtime/bfc_allocator.cc:905] Free at 000000020FC00800 next 18446744073709551615 of size 1046528 2019-12-19 22:42:15.522163: I tensorflow/core/common_runtime/bfc_allocator.cc:898] Next region of size 8388608 2019-12-19 22:42:15.522353: I tensorflow/core/common_runtime/bfc_allocator.cc:905] Free at 000000020FE00000 next 18446744073709551615 of size 8388608 2019-12-19 22:42:15.522590: I tensorflow/core/common_runtime/bfc_allocator.cc:898] Next region of size 8388608 2019-12-19 22:42:15.522763: I tensorflow/core/common_runtime/bfc_allocator.cc:905] Free at 0000000210600000 next 18446744073709551615 of size 8388608 2019-12-19 22:42:15.523051: I tensorflow/core/common_runtime/bfc_allocator.cc:898] Next region of size 16777216 2019-12-19 22:42:15.523236: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0000000210E00000 next 6 of size 6021120 2019-12-19 22:42:15.523468: I tensorflow/core/common_runtime/bfc_allocator.cc:905] Free at 00000002113BE000 next 18446744073709551615 of size 10756096 2019-12-19 22:42:15.523719: I tensorflow/core/common_runtime/bfc_allocator.cc:914] Summary of in-use Chunks by size: 2019-12-19 22:42:15.523923: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 3 Chunks of size 256 totalling 768B 2019-12-19 22:42:15.524051: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 1280 totalling 1.3KiB 2019-12-19 22:42:15.524201: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 6021120 totalling 5.74MiB 2019-12-19 22:42:15.524454: I tensorflow/core/common_runtime/bfc_allocator.cc:921] Sum Total of in-use chunks: 5.74MiB 2019-12-19 22:42:15.524644: I tensorflow/core/common_runtime/bfc_allocator.cc:923] total_region_allocatedbytes: 34603008 memorylimit: 9993660007 available bytes: 9959056999 curr_region_allocationbytes: 33554432 2019-12-19 22:42:15.524974: I tensorflow/core/common_runtime/bfc_allocator.cc:929] Stats: Limit: 9993660007 InUse: 6023168 MaxInUse: 22799872 NumAllocs: 20 MaxAllocSize: 8388608
2019-12-19 22:42:15.525331: W tensorflow/core/common_runtime/bfcallocator.cc:424] *__**__ `