Hi,

I am trying to run object_detection.py from tftrt/examples/object_detection but I get out-of-memory even on a powerful Nvidia RTX 2080 Ti (with 11GB memory). I tried with 3 different models (see below) and also tried to use --gpu_mem_cap, but I get the same error. The error happens after the conversion, in run_inference().

Here is my run script:

# MODEL="ssd_resnet50_v1_fpn_640x640_coco17_tpu-8"
# MODEL="ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8"
MODEL="efficientdet_d1_coco17_tpu-32"
cd /workspace/tensorflow/tensorrt/tftrt/examples/object_detection
python object_detection.py --input_saved_model_dir /coco/$MODEL/saved_model --output_saved_model_dir /coco/$MODEL/tftrt_model --data_dir /coco/val2017 --annotation_path /coco/annotations/instances_val2017.json --input_size 640 --batch_size 1 --use_trt --precision FP16 --gpu_mem_cap 4096

And this is the error (last part of the entire output):

2020-09-07 07:55:47.700382: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at unpack_op.cc:114 : Resource exhausted: OOM when allocating tensor with shape[76725,90] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "object_detection.py", line 435, in <module>
    target_duration=args.target_duration)
  File "object_detection.py", line 167, in run_inference
    batch_preds = graph_func(batch_images)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1655, in __call__
    return self._call_impl(args, kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1673, in _call_impl
    return self._call_with_flat_signature(args, kwargs, cancellation_manager)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1722, in _call_with_flat_signature
    return self._call_flat(args, self.captured_inputs, cancellation_manager)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/load.py", line 106, in _call_flat
    cancellation_manager)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1924, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 550, in call
    ctx=ctx)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted:  OOM when allocating tensor with shape[76725,90] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node StatefulPartitionedCall/StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/unstack_1}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

         [[StatefulPartitionedCall/StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/TRTEngineOp_0_38/_194]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

  (1) Resource exhausted:  OOM when allocating tensor with shape[76725,90] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node StatefulPartitionedCall/StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/unstack_1}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored. [Op:__inference_signature_wrapper_426616]

Function call stack:
signature_wrapper -> signature_wrapper

Environment

I am running this in a Docker container based on Nvidia's nvcr.io/nvidia/tensorrt:19.10-py3 image. Here are the specs:

GPU: Nvidia RTX 2080 Ti Host OS: Ubuntu 18.04.4 LTS Docker Version: 19.03.8 Nvidia Driver: 440.100 Docker Base Image: nvcr.io/nvidia/tensorrt:19.10-py3 Cuda Version: 10.1 Python Version: 3.6.8 TensortFlow Version: 2.3.0 TensorRT Version: 6.0.1

Let me know if you have any idea/suggestion. Thank you.

tensorflow / tensorrt

out-of-memory when running object_detection.py even on RTX 2080 Ti (11GB memory) #212

Environment