I am trying to run object_detection.py from tftrt/examples/object_detection but I get out-of-memory even on a powerful Nvidia RTX 2080 Ti (with 11GB memory). I tried with 3 different models (see below) and also tried to use --gpu_mem_cap, but I get the same error. The error happens after the conversion, in run_inference().
And this is the error (last part of the entire output):
2020-09-07 07:55:47.700382: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at unpack_op.cc:114 : Resource exhausted: OOM when allocating tensor with shape[76725,90] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
File "object_detection.py", line 435, in <module>
target_duration=args.target_duration)
File "object_detection.py", line 167, in run_inference
batch_preds = graph_func(batch_images)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1655, in __call__
return self._call_impl(args, kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1673, in _call_impl
return self._call_with_flat_signature(args, kwargs, cancellation_manager)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1722, in _call_with_flat_signature
return self._call_flat(args, self.captured_inputs, cancellation_manager)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/load.py", line 106, in _call_flat
cancellation_manager)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1924, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 550, in call
ctx=ctx)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[76725,90] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node StatefulPartitionedCall/StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/unstack_1}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[StatefulPartitionedCall/StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/TRTEngineOp_0_38/_194]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
(1) Resource exhausted: OOM when allocating tensor with shape[76725,90] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node StatefulPartitionedCall/StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/unstack_1}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
0 successful operations.
0 derived errors ignored. [Op:__inference_signature_wrapper_426616]
Function call stack:
signature_wrapper -> signature_wrapper
Environment
I am running this in a Docker container based on Nvidia's nvcr.io/nvidia/tensorrt:19.10-py3 image. Here are the specs:
Hi,
I am trying to run
object_detection.py
fromtftrt/examples/object_detection
but I get out-of-memory even on a powerful Nvidia RTX 2080 Ti (with 11GB memory). I tried with 3 different models (see below) and also tried to use--gpu_mem_cap
, but I get the same error. The error happens after the conversion, inrun_inference()
.Here is my run script:
And this is the error (last part of the entire output):
Environment
I am running this in a Docker container based on Nvidia's nvcr.io/nvidia/tensorrt:19.10-py3 image. Here are the specs:
GPU: Nvidia RTX 2080 Ti Host OS: Ubuntu 18.04.4 LTS Docker Version: 19.03.8 Nvidia Driver: 440.100 Docker Base Image: nvcr.io/nvidia/tensorrt:19.10-py3 Cuda Version: 10.1 Python Version: 3.6.8 TensortFlow Version: 2.3.0 TensorRT Version: 6.0.1
Let me know if you have any idea/suggestion. Thank you.