tensorflow / models

Models and examples built with TensorFlow
Other
77.18k stars 45.75k forks source link

Tensowflow object detection CUDA memory error after 1100 steps #8597

Open Lin1007 opened 4 years ago

Lin1007 commented 4 years ago

Prerequisites

1. The entire URL of the file you are using

https://github.com/tensorflow/models/tree/master/research/object_detection

2. Describe the bug

I'm training a Faster RCNN with inception_v2 as pertained net on colab and also AWS, using GPU of Nvidia K80 and P100 (12 GB GPU memory). The training starts correctly until 1100 steps, and then it reports CUDA memory error when evaluate the model.

3. Steps to reproduce

!python /content/models/research/object_detection/model_main.py \
    --pipeline_config_path={pipeline_fname} \
    --model_dir={model_dir} \
    --num_train_steps=15000

4. Expected behavior

Evaluate and continues training

5. Additional context

W0530 21:32:39.397191 139884677568384 deprecation.py:323] From /content/models/research/object_detection/eval_util.py:828: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
WARNING:tensorflow:From /content/models/research/object_detection/utils/visualization_utils.py:618: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
    options available in V2.
    - tf.py_function takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    - tf.numpy_function maintains the semantics of the deprecated tf.py_func
    (it is not differentiable, and manipulates numpy arrays). It drops the
    stateful argument making all functions stateful.

W0530 21:32:39.582928 139884677568384 deprecation.py:323] From /content/models/research/object_detection/utils/visualization_utils.py:618: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
    options available in V2.
    - tf.py_function takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    - tf.numpy_function maintains the semantics of the deprecated tf.py_func
    (it is not differentiable, and manipulates numpy arrays). It drops the
    stateful argument making all functions stateful.

INFO:tensorflow:Done calling model_fn.
I0530 21:32:40.102270 139884677568384 estimator.py:1150] Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2020-05-30T21:32:40Z
I0530 21:32:40.117732 139884677568384 evaluation.py:255] Starting evaluation at 2020-05-30T21:32:40Z
INFO:tensorflow:Graph was finalized.
I0530 21:32:40.560408 139884677568384 monitored_session.py:240] Graph was finalized.
2020-05-30 21:32:40.561485: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-30 21:32:40.561967: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:00:04.0
2020-05-30 21:32:40.562079: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-05-30 21:32:40.562109: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-05-30 21:32:40.562133: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-05-30 21:32:40.562156: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-05-30 21:32:40.562182: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-05-30 21:32:40.562202: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-05-30 21:32:40.562224: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-30 21:32:40.562336: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-30 21:32:40.562759: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-30 21:32:40.563113: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2020-05-30 21:32:40.563158: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-30 21:32:40.563172: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0 
2020-05-30 21:32:40.563182: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N 
2020-05-30 21:32:40.563305: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-30 21:32:40.563722: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-30 21:32:40.564091: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15216 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
INFO:tensorflow:Restoring parameters from $DATA/model.ckpt-1111
I0530 21:32:40.566126 139884677568384 saver.py:1284] Restoring parameters from $DATA/model.ckpt-1111
INFO:tensorflow:Running local_init_op.
I0530 21:32:41.608171 139884677568384 session_manager.py:500] Running local_init_op.
INFO:tensorflow:Done running local_init_op.
I0530 21:32:41.738769 139884677568384 session_manager.py:502] Done running local_init_op.
2020-05-30 21:33:09.558668: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 8589934592 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-05-30 21:33:09.565813: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 8589934592
2020-05-30 21:33:09.566039: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 7730940928 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-05-30 21:33:09.566070: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 7730940928
2020-05-30 21:33:09.566119: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 6957846528 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-05-30 21:33:09.566135: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 6957846528
2020-05-30 21:33:09.566187: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 6262061568 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-05-30 21:33:09.566203: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 6262061568
2020-05-30 21:33:09.566239: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 5635855360 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-05-30 21:33:09.566255: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 5635855360
2020-05-30 21:33:09.566292: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 5072269824 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-05-30 21:33:09.566307: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 5072269824
2020-05-30 21:33:09.566344: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 4565042688 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-05-30 21:33:09.566359: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 4565042688

6. System information

shubhamgupta568 commented 4 years ago

Hi Lin, Please try reducing the batch size, that will help.

Lin1007 commented 4 years ago

Hi Lin, Please try reducing the batch size, that will help.

Thanks for replying. But I think it isn't the main problem. I had reduced the batch size to 1, which I was able to train and evaluate at my local computer with 6GB of GPU memory, however, colab still crash with CUDA memory error, even though the batch size is 1 and having 16GB of GPU memory. I'm wondering if it is due to the actualization of TF Objection or something related to the virtual GPU, because using AWS it also crashed.

satyajitghana commented 4 years ago

not sure if this is related, but i also get the same problem, the RAM is completely filled up and the colab runtime crases, the logs says CUDA_OUT_OF_MEMORY

reproducible colab notebook: https://colab.research.google.com/drive/1Q0Aj61riRPOr3EYfvSbA9nZ1v8j7A0QE?usp=sharing

i've tried to reduce batch_size from 32 to 16, problem still persists

satyajitghana commented 4 years ago

not sure if this is related, but i also get the same problem, the RAM is completely filled up and the colab runtime crases, the logs says CUDA_OUT_OF_MEMORY

reproducible colab notebook: https://colab.research.google.com/drive/1Q0Aj61riRPOr3EYfvSbA9nZ1v8j7A0QE?usp=sharing

i've tried to reduce batch_size from 32 to 16, problem still persists

fixed it by

train_ds.batch(128).map(augment, num_parallel_calls=tf.data.experimental.AUTOTUNE).cache().prefetch(tf.data.experimental.AUTOTUNE)

so the order should be batch -> map -> cache, my bad, should have read the docs properly

codingzencc commented 4 years ago

@Lin1007 Were you able to figure it out. I used to use r.1.13 branch and didn't face that issue over there. But switched to master and facing this issue. It is essentially trying to run an eval after saving the checkpoint and runs out of memory when it will try to evaluate. I'm facing this problem for ssd_mobilenet_v1_fpn_coco model.