tensorflow / tensorrt

TensorFlow/TensorRT integration
Apache License 2.0
736 stars 226 forks source link

Object Detection example with TRT7 and TF2.1 issues #178

Open mankeyboy opened 4 years ago

mankeyboy commented 4 years ago

I'm creating this issue to help collect the issues in the Object Detection example:

To start, I have followed the steps and setup the dependencies. Now, attempting to run a synthetic test :

python object_detection.py  --input_saved_model_dir models/ssd_inception_v2_coco_2018_01_28/saved_model --output_saved_model_dir trt_engine --data_dir .  --input_size 640 --batch_size 1 --use_synthetic  --use_trt --precision FP16 --mode benchmark --num_iterations 100

Gives this error:

Benchmark arguments:
  annotation_path: None
  batch_size: 1
  calib_data_dir: None
  data_dir: .
  display_every: 100
  gpu_mem_cap: 0
  input_saved_model_dir: models/ssd_inception_v2_coco_2018_01_28/saved_model
  input_size: 640
  max_workspace_size: 1073741824
  minimum_segment_size: 2
  mode: benchmark
  num_calib_inputs: 500
  num_iterations: 100
  num_warmup_iterations: 50
  optimize_offline: False
  output_saved_model_dir: trt_engine
  precision: FP16
  target_duration: None
  use_synthetic: True
  use_trt: True
TensorRT Conversion Params:
  is_dynamic_op: True
  max_batch_size: 1
  max_workspace_size_bytes: 1073741824
  maximum_cached_engines: 1
  minimum_segment_size: 2
  precision_mode: FP16
  rewriter_config_template: None
  use_calibration: False
Conversion times:
  conversion: 49.2s
Traceback (most recent call last):
  File "object_detection.py", line 432, in <module>
    target_duration=args.target_duration)
  File "object_detection.py", line 179, in run_inference
    for i, batch_images in enumerate(dataset):
TypeError: 'NoneType' object is not iterable

On attempting to run a validation test:

python object_detection.py  --input_saved_model_dir models/ssd_inception_v2_coco_2018_01_28/saved_model --output_saved_model_dir trt_engine --data_dir coco/val2017  --annotation_path coco/annotations/instances_val2017.json --input_size 640 --batch_size 1  --use_trt --precision FP16

This error is observed:

Benchmark arguments:
  annotation_path: coco/annotations/instances_val2017.json
  batch_size: 1
  calib_data_dir: None
  data_dir: coco/val2017
  display_every: 100
  gpu_mem_cap: 0
  input_saved_model_dir: models/ssd_inception_v2_coco_2018_01_28/saved_model
  input_size: 640
  max_workspace_size: 1073741824
  minimum_segment_size: 2
  mode: validation
  num_calib_inputs: 500
  num_iterations: 2048
  num_warmup_iterations: 50
  optimize_offline: False
  output_saved_model_dir: trt_engine
  precision: FP16
  target_duration: None
  use_synthetic: False
  use_trt: True
TensorRT Conversion Params:
  is_dynamic_op: True
  max_batch_size: 1
  max_workspace_size_bytes: 1073741824
  maximum_cached_engines: 1
  minimum_segment_size: 2
  precision_mode: FP16
  rewriter_config_template: None
  use_calibration: False
Conversion times:
  conversion: 49.5s
loading annotations into memory...
Done (t=0.80s)
creating index...
index created!
2020-01-24 05:48:35.804643: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/Preprocessor/map/while/ResizeImage/TRTEngineOp_293 with input shapes: [[1,640,640,3]]
2020-01-24 05:48:35.804722: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.7
2020-01-24 05:48:35.805518: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.7
2020-01-24 05:48:37.953129: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:48:37.953524: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/TRTEngineOp_0 with input shapes: [[1,300,300,3]]
2020-01-24 05:49:16.079274: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.081927: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/TRTEngineOp_292 with input shapes: [[1,1083,91], [1,600,91], [1,150,91], [1,54,91], [1,24,91], [1,6,91]]
2020-01-24 05:49:16.085025: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_19 with input shapes: [[6,2]]
2020-01-24 05:49:16.135179: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.135250: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_20 with input shapes: [[6,2]]
2020-01-24 05:49:16.156066: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.156228: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/TRTEngineOp_294 with input shapes: [[1,1083,1,4], [1,600,1,4], [1,150,1,4], [1,54,1,4], [1,24,1,4], [1,6,1,4]]
2020-01-24 05:49:16.169885: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.169962: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_9 with input shapes: [[1083,2]]
2020-01-24 05:49:16.170840: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_8 with input shapes: [[6,2], [6,2]]
2020-01-24 05:49:16.226191: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.231928: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.238409: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.238481: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_10 with input shapes: [[1083,2]]
2020-01-24 05:49:16.263111: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.263168: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_11 with input shapes: [[600,2]]
2020-01-24 05:49:16.263210: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_3 with input shapes: [[1083,2], [1083,2]]
2020-01-24 05:49:16.286966: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.294341: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.294402: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_12 with input shapes: [[600,2]]
2020-01-24 05:49:16.318996: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.319054: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_13 with input shapes: [[150,2]]
2020-01-24 05:49:16.319084: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_4 with input shapes: [[600,2], [600,2]]
2020-01-24 05:49:16.342890: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.349788: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.349848: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_14 with input shapes: [[150,2]]
2020-01-24 05:49:16.374470: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.374529: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_15 with input shapes: [[54,2]]
2020-01-24 05:49:16.374554: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_5 with input shapes: [[150,2], [150,2]]
2020-01-24 05:49:16.398877: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.406253: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.406313: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_16 with input shapes: [[54,2]]
2020-01-24 05:49:16.431354: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.431413: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_17 with input shapes: [[24,2]]
2020-01-24 05:49:16.431439: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_6 with input shapes: [[54,2], [54,2]]
2020-01-24 05:49:16.454656: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.463058: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.463119: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_18 with input shapes: [[24,2]]
2020-01-24 05:49:16.487814: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.487886: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_7 with input shapes: [[24,2], [24,2]]
2020-01-24 05:49:16.502006: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.502190: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/TRTEngineOp_2 with input shapes: [[1917], [1917], [1917], [1917]]
2020-01-24 05:49:16.610433: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.610520: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/TRTEngineOp_1 with input shapes: [[1917], [1917], [1917], [1917]]
2020-01-24 05:49:16.718736: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.718886: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/Postprocessor/TRTEngineOp_291 with input shapes: [[1,1917,4]]
2020-01-24 05:49:16.737215: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.741486: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/TRTEngineOp_171 with input shapes: [[1917,1]]
2020-01-24 05:49:16.764692: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.764778: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/TRTEngineOp_174 with input shapes: [[1917,1]]
2020-01-24 05:49:16.779482: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/ClipToWindow_63/TRTEngineOp_81 with input shapes: [[0,4]]
2020-01-24 05:49:16.789812: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger Parameter check failed at: ../builder/builder.cpp::setMaxBatchSize::135, condition: batchSize > 0 && batchSize <= MAX_BATCH_SIZE
2020-01-24 05:49:16.803796: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.803877: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/TRTEngineOp_177 with input shapes: [[1917,1]]
2020-01-24 05:49:16.805700: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/ClipToWindow_66/TRTEngineOp_84 with input shapes: [[14,4]]
2020-01-24 05:49:16.842692: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.842728: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger Parameter check failed at: engine.cpp::enqueue::292, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 0, but engine max batch size was: 1
2020-01-24 05:49:16.842741: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:635] Failed to enqueue batch for TRT engine: StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/ClipToWindow_63/TRTEngineOp_81
2020-01-24 05:49:16.842752: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:506] Failed to execute engine, retrying with native segment for StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/ClipToWindow_63/TRTEngineOp_81
2020-01-24 05:49:16.843134: F tensorflow/core/framework/op_kernel.cc:875] Check failed: mutable_output(index) == nullptr (0x7ff5cc03d7c0 vs. nullptr)
Aborted
aalugore commented 4 years ago

This is the exact same issue I am seeing here https://github.com/tensorflow/tensorflow/issues/33184#issuecomment-577881589

Looks like this is more widespread than just me. Hopefully this means it will get more attention.

suhyung-code42 commented 4 years ago

I met similar error. My test env is "tf2.0-trt7.0"

Benchmark arguments: annotation_path: None batch_size: 1 calib_data_dir: None data_dir: . display_every: 100 gpu_mem_cap: 0 input_saved_model_dir: /home/suhyung/work/git/tf_trt_models/examples/detection/data/faster_rcnn_resnet50_coco_2018_01_28/saved_model/ input_size: 640 max_workspace_size: 1073741824 minimum_segment_size: 2 mode: benchmark num_calib_inputs: 500 num_iterations: 100 num_warmup_iterations: 50 optimize_offline: False output_saved_model_dir: trt_engine precision: FP16 target_duration: None use_synthetic: True use_trt: True TensorRT Conversion Params: is_dynamic_op: True max_batch_size: 1 max_workspace_size_bytes: 1073741824 maximum_cached_engines: 1 minimum_segment_size: 2 precision_mode: FP16 rewriter_config_template: None use_calibration: False Conversion times: conversion: 49.2s Traceback (most recent call last): File "object_detection.py", line 432, in target_duration=args.target_duration) File "object_detection.py", line 160, in run_inference input_size=input_size) TypeError: cannot unpack non-iterable NoneType object

mankeyboy commented 4 years ago

I've been able to clear a few of the above errors and now I'm able to get it working for even batch sizes using the models from the r1.14+ branch of the code. However, the output I'm getting doesn't give the correct accuracy and the logs tell that because the saved_model.pb in the model Eg:

'ssd_inception_v2_coco':
    Model(
        'ssd_inception_v2_coco',
        'http://download.tensorflow.org/models/object_detection/ssd_inception_v2_coco_2018_01_28.tar.gz',
        'ssd_inception_v2_coco_2018_01_28',
    )

doesn't have variables saved in the variable folder and so I'm basically running an untrained graph. This function controls how the saved model is loaded onto the graph. The pretrained model has a checkpoint file and and a frozen_inference_graph but TensorRT takes only SavedModel in TF2.x so the only way is to load the checkpoint file or frozen_inference_graph and convert it into a SavedModel.
First, I tried this modification to the function to get to a SavedModel from the checkpoint:

 with tf.compat.v1.Session() as sess:
      new_saver = tf.compat.v1.train.import_meta_graph(saved_model_dir+'/model.ckpt.meta')
      new_saver.restore(sess, tf.train.latest_checkpoint(saved_model_dir+'/'))
      graph_func = tf.compat.v1.graph_util.convert_variables_to_constants(
            sess,
            tf.compat.v1.get_default_graph().as_graph_def(),
            output_node_names=['detection_boxes', 'detection_classes', 'detection_scores', 'num_detections'])
      tf.compat.v1.saved_model.simple_save(sess, saved_model_dir+'/test', 
                            inputs = {'image_tensor': image_tensor}
                            outputs={'detection_boxes': detection_boxes, 'detection_classes': detection_classes, 'detection_scores': detection_scores, 'num_detections': num_detections})

The code fails on this call because of errors in the placeholders and input_names and stackoverflow answers say that I need to have access to the original function that created this checkpoint to convert it.

Hence, the next approach, converting from frozen_inference_graph.pb:

INPUT_NAME = 'image_tensor'
BOXES_NAME = 'detection_boxes'
CLASSES_NAME = 'detection_classes'
SCORES_NAME = 'detection_scores'
NUM_DETECTIONS_NAME = 'num_detections'
FROZEN_GRAPH_NAME = 'frozen_inference_graph.pb'

def get_func_from_saved_model(saved_model_dir):

  builder = tf.compat.v1.saved_model.builder.SavedModelBuilder(saved_model_dir+'/test')
  frozen_graph_path = os.path.join(saved_model_dir, FROZEN_GRAPH_NAME)
  print(frozen_graph_path)
  graph_func = tf.compat.v1.GraphDef()
  with open(frozen_graph_path, 'rb') as f:
    graph_func.ParseFromString(f.read())

  sigs = {}
  with tf.compat.v1.Session(graph=tf.compat.v1.Graph()) as sess:
    # name="" is important to ensure we don't get spurious prefixing
    tf.compat.v1.import_graph_def(graph_func, name="")
    tf_graph = tf.compat.v1.get_default_graph()
    tf_input = tf_graph.get_tensor_by_name(INPUT_NAME+':0')
    tf_boxes = tf_graph.get_tensor_by_name(BOXES_NAME + ':0')
    tf_classes = tf_graph.get_tensor_by_name(CLASSES_NAME + ':0')
    tf_scores = tf_graph.get_tensor_by_name(SCORES_NAME + ':0')
    tf_num_detections = tf_graph.get_tensor_by_name(NUM_DETECTIONS_NAME + ':0')

    sigs[signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY] = \
        tf.compat.v1.saved_model.signature_def_utils.predict_signature_def(
            {INPUT_NAME: tf_input}, {BOXES_NAME: tf_boxes, CLASSES_NAME: tf_classes, SCORES_NAME: tf_scores, NUM_DETECTIONS_NAME: tf_num_detections})

    builder.add_meta_graph_and_variables(sess,
                                         [tag_constants.SERVING],
                                         signature_def_map=sigs)
  builder.save()
  saved_model_loaded = tf.saved_model.load(
      saved_model_dir+'/test', tags=[tag_constants.SERVING])
  graph_func = saved_model_loaded.signatures[
      signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
  return graph_func 

This works but it doesn't create any variables folder and so the saved_model is completely untrained which doesn't serve the purpose. I'm getting throughput numbers but my mAP value is showing that this is an untrained graph run.

My run call is: python object_detection.py --input_saved_model_dir models/ssd_inception_v2_coco_2018_01_28 --output_saved_model_dir trt_engine --data_dir coco/val2017 --annotation_path coco/annotations/instances_val2017.json --input_size 640 --batch_size 8 --num_warmup_iterations 10 --minimum_segment_size 3 --num_iterations 50 --use_trt --precision FP16

@pooyadavoodi @vinhngx @aaroey Any tips? I'm looking for a way to get the trained model loaded properly like it was for r1.14+

aaroey commented 4 years ago

@tfeher could you help to take a look at this? Also @bixia1

mankeyboy commented 4 years ago

I updated https://github.com/tensorflow/tensorflow/issues/36724 with new comments for the bug I have raised there.

vdevaram commented 4 years ago

I am having different issue for the command : python object_detection.py --input_saved_model_dir $HOME/trt/obj_models/ssd_mobilenet_v2_coco_2018_03_29/saved_model/ --output_saved_model_dir $HOME/trt/obj_out_dir --optimize_offline --data_dir $HOME/trt/coco_data/val2017 --annotation_path $HOME/trt/coco_data/annotations/instances_val2017.json --batch_size 1 --use_trt --mode benchmark --precision FP32 --input_size 640 CUDA 10.2 CUDNN 7.6.5 TRT - 7 TF : master (after 2.1.0)

*Traceback (most recent call last): File "object_detection.py", line 410, in optimize_offline=args.optimize_offline) File "object_detection.py", line 121, in get_graph_func converter.build(input_fn=partial(input_fn, data_dir, 1)) File "/home/vinod/nvidia/p3_env/lib/python3.6/site-packages/tensorflow/python/compiler/tensorrt/trt_convert.py", line 1116, in build self._converted_func(map(ops.convert_to_tensor, inp)) File "/home/vinod/nvidia/p3_env/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1600, in call return self._call_impl(args, kwargs) File "/home/vinod/nvidia/p3_env/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1640, in _call_impl return self._call_flat(args, self.captured_inputs, cancellation_manager) File "/home/vinod/nvidia/p3_env/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1741, in _call_flat ctx, args, cancellation_manager=cancellation_manager)) File "/home/vinod/nvidia/p3_env/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 598, in call ctx=ctx) File "/home/vinod/nvidia/p3_env/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute inputs, attrs, num_outputs) tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: Incorrect batch dimension, for Postprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/ClipToWindow_84/TRTEngineOp_86: [[0,4]] [[node Postprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/ClipToWindow_84/TRTEngineOp_86 (defined at object_detection.py:118) ]] [[Postprocessor/BatchMultiClassNonMaxSuppression/map/TensorArrayStack_4/range/_68]] (1) Invalid argument: Incorrect batch dimension, for Postprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/ClipToWindow_84/TRTEngineOp_86: [[0,4]] [[node Postprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/ClipToWindow_84/TRTEngineOp_86 (defined at object_detection.py:118) ]] 0 successful operations. 0 derived errors ignored. [Op:__inference_pruned_66916]**

vdevaram commented 4 years ago

@aaroey : is this known issue? I am facing the segmentation issue even in latest Tensorflow container from NGC 20.01-tf1-py3

mankeyboy commented 4 years ago

@vdevaram You can solve your issue by providing this argument --minimum_segment_size 3 when you make your run. I have already opened a bug related to this at tensorflow. The default segment size used by TensorRT for optimisations is 3 and in the code, we are trying to use 2, which even though suboptimal according to recommendations shouldn't fail. Discussion over this is ongoing on the other issue :)

tfeher commented 4 years ago
vdevaram commented 4 years ago

Now I moved to nvcr.io/nvidia/tensorflow:20.02-tf2-py3 and tried with TF object detection models. Here the result for Frcnn. Although it is working, I am seeing lot of latency variation with thermal rise upto 85C. is there any other problem?

cmd : python object_detection.py --input_saved_model_dir /local/obj_models/faster_rcnn_resnet50_coco_2018_01_28/saved_model/ --output_saved_model_dir /local/obj_out_dir --optimize_offline --data_dir /local/coco_data/val2017 --annotation_path /local/coco_data/annotations/instances_val2017.json --batch_size 1 --use_trt --mode benchmark --precision FP32 --input_size 600 --minimum_segment_size 3

benchmark result : _step 101/2048, iter_time(ms)=86 step 201/2048, iter_time(ms)=93 step 301/2048, iter_time(ms)=90 step 401/2048, iter_time(ms)=89 step 501/2048, iter_time(ms)=90 step 601/2048, iter_time(ms)=85 step 701/2048, iter_time(ms)=91 step 801/2048, iter_time(ms)=87 step 901/2048, iter_time(ms)=91 step 1001/2048, iter_time(ms)=92 step 1101/2048, iter_time(ms)=87 step 1201/2048, iter_time(ms)=89 step 1301/2048, iter_time(ms)=100 step 1401/2048, iter_time(ms)=106 step 1501/2048, iter_time(ms)=125 step 1601/2048, iter_time(ms)=204 step 1701/2048, iter_time(ms)=111 step 1801/2048, iter_time(ms)=118 step 1901/2048, iter_time(ms)=108 step 2001/2048, iter_time(ms)=255 Results: images/sec: 9 99th_percentile(ms): 378.12 total_time(s): 225.9 latency_mean(ms): 115.90 latency_median(ms): 92.96 latencymin(ms): 77.35