Open numahha opened 3 years ago
@numahha Thanks for reporting this. I am looking into this.
@numahha Possible reason for this might be that TensorFlow prefetches the data for next iteration. It is happening here in the code (https://github.com/isl-org/Open3D-ML/blob/master/ml3d/tf/dataloaders/tf_dataloader.py#L158). Could you try again after disabling it ?
Thank you for the suggestion. I tried disabling it but got another error. This happened regardless of the batch size.
How to desable it:
if (self.model is None or 'batcher' not in self.model_cfg.keys()
#or self.model_cfg.batcher == 'DefaultBatcher'
):
loader = loader.batch(batch_size)
Error message when disabling it:
(o3dmltf) ub18@ub18-desktop:~/Open3D-ML$ python test_tf.py
2021-12-03 10:20:26.671009: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-12-03 10:20:27.430333: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-12-03 10:20:27.967629: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-12-03 10:20:27.967769: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-03 10:20:27.968141: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2070 SUPER computeCapability: 7.5
coreClock: 1.785GHz coreCount: 40 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2021-12-03 10:20:27.968186: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-12-03 10:20:27.970059: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-12-03 10:20:27.970112: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-12-03 10:20:27.970766: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-12-03 10:20:27.970960: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-12-03 10:20:27.971430: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-12-03 10:20:27.972098: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-12-03 10:20:27.972220: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-12-03 10:20:27.972290: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-03 10:20:27.972624: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-03 10:20:27.972929: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-12-03 10:20:27.973160: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-12-03 10:20:27.973464: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-03 10:20:27.973793: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2070 SUPER computeCapability: 7.5
coreClock: 1.785GHz coreCount: 40 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2021-12-03 10:20:27.973840: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-03 10:20:27.974138: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-03 10:20:27.974429: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-12-03 10:20:28.369102: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-12-03 10:20:28.369144: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2021-12-03 10:20:28.369166: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2021-12-03 10:20:28.369281: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-03 10:20:28.369645: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-03 10:20:28.369947: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-03 10:20:28.370241: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6222 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2070 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5)
INFO - 2021-12-03 10:20:28,530 - semantic_segmentation - <open3d._ml3d.tf.models.randlanet.RandLANet object at 0x7fde282a6d90>
INFO - 2021-12-03 10:20:28,530 - semantic_segmentation - Logging in file : ./logs/RandLANet_SemanticKITTI_tf/log_train_2021-12-03_10:20:28.txt
INFO - 2021-12-03 10:20:28,559 - semantickitti - Found 19130 pointclouds for training
INFO - 2021-12-03 10:20:31,531 - semantickitti - Found 4071 pointclouds for validation
INFO - 2021-12-03 10:20:32,204 - semantic_segmentation - Writing summary in train_log/00015_RandLANet_SemanticKITTI_tf.
INFO - 2021-12-03 10:20:32,205 - semantic_segmentation - Initializing from scratch.
INFO - 2021-12-03 10:20:32,205 - semantic_segmentation - === EPOCH 0/100 ===
training: 0%| | 0/19130 [00:00<?, ?it/s]2021-12-03 10:20:32.223916: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021-12-03 10:20:32.242234: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 3600000000 Hz
2021-12-03 10:20:32.368198: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-12-03 10:20:32.752362: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
training: 0%| | 0/19130 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/ub18/Open3D-ML/test_tf.py", line 23, in <module>
pipeline.run_train()
File "/home/ub18/.local/lib/python3.9/site-packages/open3d/_ml3d/tf/pipelines/semantic_segmentation.py", line 250, in run_train
results = model(inputs, training=True)
File "/home/ub18/anaconda3/envs/o3dmltf/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1030, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/home/ub18/.local/lib/python3.9/site-packages/open3d/_ml3d/tf/models/randlanet.py", line 259, in call
f_encoder_i = self.forward_dilated_res_block(
File "/home/ub18/.local/lib/python3.9/site-packages/open3d/_ml3d/tf/models/randlanet.py", line 223, in forward_dilated_res_block
f_pc = m_conv2d(feature, training=self.training)
File "/home/ub18/anaconda3/envs/o3dmltf/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1030, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/home/ub18/.local/lib/python3.9/site-packages/open3d/_ml3d/tf/utils/helper_tf.py", line 50, in call
x = self.conv(x)
File "/home/ub18/anaconda3/envs/o3dmltf/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1013, in __call__
input_spec.assert_input_compatibility(self.input_spec, inputs, self.name)
File "/home/ub18/anaconda3/envs/o3dmltf/lib/python3.9/site-packages/tensorflow/python/keras/engine/input_spec.py", line 230, in assert_input_compatibility
raise ValueError('Input ' + str(input_index) + ' of layer ' +
ValueError: Input 0 of layer conv2d_1 is incompatible with the layer: : expected min_ndim=4, found ndim=3. Full shape received: (45056, 8, 1)
Error message when not disabling it:
(o3dmltf) ub18@ub18-desktop:~/Open3D-ML$ python test_tf.py
2021-12-03 10:23:23.552434: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-12-03 10:23:24.305214: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-12-03 10:23:24.842576: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-12-03 10:23:24.842695: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-03 10:23:24.843063: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2070 SUPER computeCapability: 7.5
coreClock: 1.785GHz coreCount: 40 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2021-12-03 10:23:24.843114: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-12-03 10:23:24.844961: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-12-03 10:23:24.845016: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-12-03 10:23:24.845716: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-12-03 10:23:24.845909: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-12-03 10:23:24.846345: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-12-03 10:23:24.846964: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-12-03 10:23:24.847090: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-12-03 10:23:24.847163: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-03 10:23:24.847495: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-03 10:23:24.847794: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-12-03 10:23:24.848037: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-12-03 10:23:24.848393: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-03 10:23:24.848715: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2070 SUPER computeCapability: 7.5
coreClock: 1.785GHz coreCount: 40 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2021-12-03 10:23:24.848762: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-03 10:23:24.849049: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-03 10:23:24.849317: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-12-03 10:23:25.238891: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-12-03 10:23:25.238923: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2021-12-03 10:23:25.238943: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2021-12-03 10:23:25.239057: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-03 10:23:25.239391: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-03 10:23:25.239691: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-03 10:23:25.239985: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6228 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2070 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5)
INFO - 2021-12-03 10:23:25,397 - semantic_segmentation - <open3d._ml3d.tf.models.randlanet.RandLANet object at 0x7f6061d11d90>
INFO - 2021-12-03 10:23:25,397 - semantic_segmentation - Logging in file : ./logs/RandLANet_SemanticKITTI_tf/log_train_2021-12-03_10:23:25.txt
INFO - 2021-12-03 10:23:25,426 - semantickitti - Found 19130 pointclouds for training
INFO - 2021-12-03 10:23:28,380 - semantickitti - Found 4071 pointclouds for validation
INFO - 2021-12-03 10:23:29,046 - semantic_segmentation - Writing summary in train_log/00016_RandLANet_SemanticKITTI_tf.
INFO - 2021-12-03 10:23:29,048 - semantic_segmentation - Initializing from scratch.
INFO - 2021-12-03 10:23:29,048 - semantic_segmentation - === EPOCH 0/100 ===
training: 0%| | 0/4783 [00:00<?, ?it/s]2021-12-03 10:23:29.066488: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021-12-03 10:23:29.085860: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 3600000000 Hz
2021-12-03 10:23:29.460843: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-12-03 10:23:29.849283: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-12-03 10:23:29.948460: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-12-03 10:23:30.323331: I tensorflow/stream_executor/cuda/cuda_dnn.cc:359] Loaded cuDNN version 8201
2021-12-03 10:23:40.506214: W tensorflow/core/common_runtime/bfc_allocator.cc:456] Allocator (GPU_0_bfc) ran out of memory trying to allocate 88.00MiB (rounded to 92274688)requested by op BiasAdd
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation.
Current allocation summary follows.
Current allocation summary follows.
2021-12-03 10:23:40.506317: I tensorflow/core/common_runtime/bfc_allocator.cc:991] BFCAllocator dump for GPU_0_bfc
...
2021-12-03 10:23:40.523610: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Stats:
Limit: 6531317760
InUse: 6508513280
MaxInUse: 6508513280
NumAllocs: 487
MaxAllocSize: 184549376
Reserved: 0
PeakReserved: 0
LargestFreeBlock: 0
2021-12-03 10:23:40.523648: W tensorflow/core/common_runtime/bfc_allocator.cc:467] ****************************************************************************************************
2021-12-03 10:23:40.523770: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at bias_op.cc:331 : Resource exhausted: OOM when allocating tensor with shape[11264,16,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
training: 0%| | 0/4783 [00:11<?, ?it/s]
Traceback (most recent call last):
File "/home/ub18/Open3D-ML/test_tf.py", line 23, in <module>
pipeline.run_train()
File "/home/ub18/.local/lib/python3.9/site-packages/open3d/_ml3d/tf/pipelines/semantic_segmentation.py", line 250, in run_train
results = model(inputs, training=True)
File "/home/ub18/anaconda3/envs/o3dmltf/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1030, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/home/ub18/.local/lib/python3.9/site-packages/open3d/_ml3d/tf/models/randlanet.py", line 259, in call
f_encoder_i = self.forward_dilated_res_block(
File "/home/ub18/.local/lib/python3.9/site-packages/open3d/_ml3d/tf/models/randlanet.py", line 225, in forward_dilated_res_block
f_pc = self.forward_building_block(xyz, f_pc, neigh_idx, name + 'LFA')
File "/home/ub18/.local/lib/python3.9/site-packages/open3d/_ml3d/tf/models/randlanet.py", line 208, in forward_building_block
f_pc_agg = self.forward_att_pooling(f_concat, name + 'att_pooling_1')
File "/home/ub18/.local/lib/python3.9/site-packages/open3d/_ml3d/tf/models/randlanet.py", line 173, in forward_att_pooling
att_activation = m_dense(f_reshaped)
File "/home/ub18/anaconda3/envs/o3dmltf/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1030, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/home/ub18/anaconda3/envs/o3dmltf/lib/python3.9/site-packages/tensorflow/python/keras/layers/core.py", line 1253, in call
outputs = nn_ops.bias_add(outputs, self.bias)
File "/home/ub18/anaconda3/envs/o3dmltf/lib/python3.9/site-packages/tensorflow/python/util/dispatch.py", line 206, in wrapper
return target(*args, **kwargs)
File "/home/ub18/anaconda3/envs/o3dmltf/lib/python3.9/site-packages/tensorflow/python/ops/nn_ops.py", line 3377, in bias_add
return gen_nn_ops.bias_add(
File "/home/ub18/anaconda3/envs/o3dmltf/lib/python3.9/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 678, in bias_add
_ops.raise_from_not_ok_status(e, name)
File "/home/ub18/anaconda3/envs/o3dmltf/lib/python3.9/site-packages/tensorflow/python/framework/ops.py", line 6897, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[11264,16,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:BiasAdd]
With changing the batch size, I tried both PyTorch and Tensorflow versions of RandLANet on SemanticKITTI. For PyTorch, I could start training with batch size 5, while I could not with batch size 6 due to CUDA out of memory error. For Tensorflow, I could with batch size 2, while I could not with batch size 3 due to error message "ResourceExhaustedError: OOM when allocating ...". So, I could only use half the batch size in Tensorflow on the same GPU. The code that I use is
I'd like to know where the problem lies and how to solve it.
Thanks.