triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.25k stars 1.47k forks source link

tritonclientutils.utils.InferenceServerException: [StatusCode.UNIMPLEMENTED] #1745

Closed adfayed closed 4 years ago

adfayed commented 4 years ago

Description Hello, okay so I am trying to do object detection inference on the GPU, RTX 2070 Super, of my own custom trained TensorFlow neural network based on the Faster Rcnn Inception Resnet v2 COCO (pre-trained) network provided in the Model Zoo. (here is the direct download link)

Triton Information What version of Triton are you using? nvcr.io/nvidia/tritonserver 20.03.1-py3 AND nvcr.io/nvidia/tritonserver 20.03.1-py3-clientsdk

Nvidia driver version nvidia-driver-440 | 440.100-0ubuntu0.18.04.1

Are you using the Triton container or did you build it yourself? I am using the container. versions 20.03.1-py3 and 20.03.1-py3-clientsdk

To Reproduce To walk you step by step on what I did (to replicate this):

  1. Downloaded the pre-trained network from above. You can skip the following two steps (STEP 2 & 3) to save time since it does the same thing without custom re-training.
  2. Used this config file "pipeline_tf1.15.config" to re-train ("transfer-learning" for about 4000 steps on GPU) by running cmd:
    $ python ${OBJ_DET_DIR}/model_main.py --pipeline_config_path="${CKPT_DIR}/pipeline_tf1.15.config" \
    --model_dir="${TRAIN_DIR}" --num_train_steps=${i} --num_eval_steps=400

Here is the config file "pipeline_tf1.15.config":

# Faster R-CNN with Inception Resnet v2, Atrous version;
# Configured for MSCOCO Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.

model {
  faster_rcnn {
    num_classes: 3
    image_resizer {
      fixed_shape_resizer {
        height: 640
        width: 960
    resize_method: BICUBIC
      }
    }
    feature_extractor {
      type: 'faster_rcnn_inception_resnet_v2'
      first_stage_features_stride: 8 # Must be 8 or 16
    }
    first_stage_anchor_generator {
      grid_anchor_generator {
        scales: [0.25, 0.5, 1.0, 2.0]
        aspect_ratios: [0.5, 1.0, 2.0]
        height_stride: 2 #8
        width_stride: 2 #8
      }
    }
    first_stage_atrous_rate: 2
    first_stage_box_predictor_conv_hyperparams {
      op: CONV
      regularizer {
        l2_regularizer {
          weight: 0.0
        }
      }
      initializer {
        truncated_normal_initializer {
          stddev: 0.01
        }
      }
    }
    first_stage_nms_score_threshold: 0.0
    first_stage_nms_iou_threshold: 0.7
    first_stage_max_proposals: 9 #18
    first_stage_localization_loss_weight: 2.0
    first_stage_objectness_loss_weight: 1.0
    initial_crop_size: 34 #17
    maxpool_kernel_size: 8 #4
    maxpool_stride: 1
    second_stage_box_predictor {
      mask_rcnn_box_predictor {
        use_dropout: true #false
        dropout_keep_probability: 0.8
        fc_hyperparams {
          op: FC
          regularizer {
            l2_regularizer {
              weight: 0.0
            }
          }
          initializer {
            variance_scaling_initializer {
              factor: 1.0
              uniform: true
              mode: FAN_AVG
            }
          }
        }
      }
    }
    second_stage_post_processing {
      batch_non_max_suppression {
        score_threshold: 0.0
        iou_threshold: 0.6
        max_detections_per_class: 3
        max_total_detections: 3 #6
      }
      score_converter: SOFTMAX
    }
    second_stage_localization_loss_weight: 2.0
    second_stage_classification_loss_weight: 1.0
    second_stage_batch_size: 64
  }
}
train_config {
  batch_size: 1
  data_augmentation_options {
    random_rotation90 {
    }
    random_rgb_to_gray {
      probability: 0.05
    }
    random_pixel_value_scale {
      minval: 0.9
      maxval: 1.1
    }
    random_patch_gaussian {
      random_coef: 0.80
      min_patch_size: 1
      max_patch_size: 50
    }
    random_black_patches {
      max_black_patches: 10
      probability: 0.40
    }
    random_adjust_brightness {
    }
    random_adjust_contrast {
    }
    random_adjust_hue {
      max_delta: 0.06
    }
    random_adjust_saturation {
      min_delta: 0.8
      min_delta: 1.25
    }
    random_jitter_boxes {
      ratio: 0.08
    }
  }
  sync_replicas: true
  optimizer {
    adam_optimizer: {
      learning_rate: {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.0003
          decay_steps: 2000
          decay_factor: 0.85
        }
      }
    }
    use_moving_average: true 
  }
  gradient_clipping_by_norm: 10.0
  fine_tune_checkpoint: "/tensorflow/learn_pet/ckpt/model.ckpt"
  num_steps: 200000
  from_detection_checkpoint: true
  load_all_detection_checkpoint_vars: true
  batch_queue_capacity: 1
  num_batch_queue_threads: 1
  prefetch_queue_capacity: 1
}
train_input_reader {
  label_map_path: "/tensorflow/learn_pet/dataset/ag_label_map.pbtxt"
  shuffle: true
  tf_record_input_reader {
    input_path: "/tensorflow/learn_pet/dataset/train_dataset_bigger_bb.record"
  }
  max_number_of_boxes: 1
}
eval_config {
  metrics_set: "coco_detection_metrics"
  num_examples: 400
}
eval_input_reader {
  label_map_path: "/tensorflow/learn_pet/dataset/ag_label_map.pbtxt"
  shuffle: true
  num_readers: 1
  tf_record_input_reader {
    input_path: "/tensorflow/learn_pet/dataset/validate_dataset_bigger_bb.record"
  }
  max_number_of_boxes: 1
}
  1. Exported the model into a frozen inference graph (savedmodel.pb) by running cmd:

    $ python ${OBJ_DET_DIR}/export_inference_graph.py --pipeline_config_path="${CKPT_DIR}/pipeline_tf1.15.config" \
    --trained_checkpoint_prefix="${TRAIN_DIR}/model.ckpt-$i" --output_directory="${OUTPUT_DIR)"
  2. Ran Nvidia Triton inference server container version nvcr.io/nvidia/tritonserver 20.03.1-py3 NGC docker pull command here by cmd:

    $ docker run --gpus all --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 -v/full/path/to/example/model/repository:/models <docker image ID> 

    saved_model_cli shows this:

    
    MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['serving_default']: The given SavedModel SignatureDef contains the following input(s): inputs['inputs'] tensor_info: dtype: DT_UINT8 shape: (-1, -1, -1, 3) name: image_tensor:0 The given SavedModel SignatureDef contains the following output(s): outputs['detection_boxes'] tensor_info: dtype: DT_FLOAT shape: (-1, 3, 4) name: detection_boxes:0 outputs['detection_classes'] tensor_info: dtype: DT_FLOAT shape: (-1, 3) name: detection_classes:0 outputs['detection_features'] tensor_info: dtype: DT_FLOAT shape: (-1, -1, -1, -1, -1) name: detection_features:0 outputs['detection_multiclass_scores'] tensor_info: dtype: DT_FLOAT shape: (-1, 3, 4) name: detection_multiclass_scores:0 outputs['detection_scores'] tensor_info: dtype: DT_FLOAT shape: (-1, 3) name: detection_scores:0 outputs['num_detections'] tensor_info: dtype: DT_FLOAT shape: (-1) name: num_detections:0 outputs['raw_detection_boxes'] tensor_info: dtype: DT_FLOAT shape: (-1, 9, 4) name: raw_detection_boxes:0 outputs['raw_detection_scores'] tensor_info: dtype: DT_FLOAT shape: (-1, 9, 4) name: raw_detection_scores:0 Method name is: tensorflow/serving/predict


5. Converted Tensorflow model into Tensorflow TensorRT model by running cmd: *(Inside the docker container 20.03.1-py3 container)*

$ python3 trt_model_converter.py

*Here is the code in script file "trt_model_converter.py"*:

from tensorflow.python.compiler.tensorrt import trt_convert as trt

converter = trt.TrtGraphConverter(input_saved_model_dir='/triton-inference-server/docs/examples/my_models/frcnn_incep_resnet_v2/5000/model.savedmodel', minimum_segment_size=3,is_dynamic_op=True,maximum_cached_engines=5,precision_mode="FP16") # max_workspace_size_bytes=4000000000

converted = converter.convert()

converter.save(output_saved_model_dir='/triton-inference-server/docs/examples/my_models/frcnn_incep_resnet_v2/5000_FP16')


6. Once model was converted successfully, here is the repository structure:

$ export MODEL_REPO="/triton-inference-server/docs/examples/my_models"

root@aa94996e622f:/triton-inference-server/docs/examples/my_models# $ lsar .: total 4 drwxr-xr-x 3 root root 4096 Jul 3 23:45 frcnn_incep_resnet_v2

./frcnn_incep_resnet_v2: total 20 drwxr-xr-x 3 root root 4096 Jun 24 21:25 5001 -rw-r--r-- 1 root root 110 Jun 26 23:38 ag_label_map.txt -rw-r--r-- 1 root root 1630 Jul 3 23:45 config.pbtxt -rw-r--r-- 1 root root 4599 Jun 19 05:09 pipeline_tf1.15.config

./frcnn_incep_resnet_v2/5001: total 4 drwxr-xr-x 2 root root 4096 Jun 24 21:25 model.savedmodel

./frcnn_incep_resnet_v2/5001/model.savedmodel: total 233560 -rw-r--r-- 1 root root 239161986 Jun 24 19:51 saved_model.pb


*Here is the "config.pbtxt" file*:

name: "frcnn_incep_resnet_v2" platform: "tensorflow_savedmodel" max_batch_size: 0 version_policy: { all { }} # "latest" Only the latest 'n' versions of the model in the repository

are available for inferencing. Or "specific" Only the specifically listed versions of the model are

available for inferencing.

input [ { name: "inputs" data_type: TYPE_UINT8 dims: [-1, -1, -1, 3 ] } ] output [ { name: "detection_boxes" data_type: TYPE_FP32 dims: [-1, 3, 4] # the 3 here represents max_total_detections in pipeline.config }, { name: "detection_classes" data_type: TYPE_FP32 dims: [-1, 3] # the 3 here represents max_total_detections in pipeline.config }, { name: "detection_features" data_type: TYPE_FP32 dims: [-1, -1, -1, -1, -1] }, { name: "detection_multiclass_scores" data_type: TYPE_FP32 dims: [-1, 3, 4] # the 3 here represents max_total_detections in pipeline.config }, { name: "detection_scores" data_type: TYPE_FP32 dims: [-1, 3] # the 3 here represents max_total_detections in pipeline.config }, { name: "num_detections" data_type: TYPE_FP32 dims: [-1] }, { name: "raw_detection_boxes" data_type: TYPE_FP32 dims: [-1, 9, 4] # the 9 here represents first_stage_max_proposals in pipeline.config }, { name: "raw_detection_scores" data_type: TYPE_FP32 dims: [-1, 9, 4] # the 9 here represents first_stage_max_proposals in pipeline.config } ] instance_group [ { count: 5 kind: KIND_GPU gpus: [ 0 ] } ]


7. Loaded converted model into tritonserver by running cmd: *(Inside the docker container 20.03.1-p3)*

$ tritonserver --strict-model-config=True --log-verbose=1 --model-repository=$(MODEL_REPO) \ --api-version=1 --allow-http=True --http-port=8080 --allow-grpc=True --grpc-port=8082 \ --grpc-infer-allocation-pool-size=0 --trace-file=/tmp/tritonserver_trace.json --trace-rate=1000 \ --trace-level=MAX --tf-allow-soft-placement=True


6. Ran nvidia triton inference clientsdk container version *nvcr.io/nvidia/tritonserver   20.03.1-py3-clientsdk* [NGC docker pull command here](https://ngc.nvidia.com/catalog/containers/nvidia:tritonserver/tags) by cmd:

$ sudo docker run -it --name=tritonserver_client --net=host nvcr.io/nvidia/tritonserver:20.03.1-py3-clientsdk


*In the docker container (20.03.1-py3-clientsdk) run command:*

$ python3 frcnn_grpc_v2_triton_infer_client.py

*Here is the code in script file "frcnn_grpc_v2_triton_infer_client.py"*:

import argparse import numpy as np import sys import cv2 from PIL import Image import PIL.Image import PIL.ImageDraw import PIL.ImageFont import pdb

import tritongrpcclient

REDU_WIDTH_FRCNN = 960 REDU_HEIGHT_FRCNN = 640 REDU_WIDTH_SSD = 960 REDU_HEIGHT_SSD = 640 ORIG_WIDTH = 1920 ORIG_HEIGHT = 1280 ORIG_IMAGE_SIZE = (ORIG_WIDTH, ORIG_HEIGHT) REDU_IMAGE_SIZE_FRCNN = (REDU_WIDTH_FRCNN, REDU_HEIGHT_FRCNN) REDU_IMAGE_SIZE_SSD = (REDU_WIDTH_SSD, REDU_HEIGHT_SSD) MIN_SCORE_THRESHOLD = 0.10

if name == 'main': parser = argparse.ArgumentParser() parser.add_argument('-v', '--verbose', action="store_true", required=False, default=False, help='Enable verbose output') parser.add_argument('-u', '--url', type=str, required=False, default='localhost:8082', help='Inference server URL. Default is localhost:8001.')

FLAGS = parser.parse_args()
try:
    triton_client = tritongrpcclient.InferenceServerClient(url=FLAGS.url,
                                                     verbose=FLAGS.verbose)
except Exception as e:
    print("channel creation failed: " + str(e))
    sys.exit()

model_name = 'frcnn_incep_resnet_v2'

# Infer
inputs = []
outputs = []
#inputs.append(tritongrpcclient.InferInput('inputs', [-1, -1, -1, 3], "UINT8"))
inputs.append(tritongrpcclient.InferInput('inputs', [640, 960, 3], "UINT8"))

# Obtain the data for the input tensors.
infer_image = cv2.imread('/workspace/install/python/IMG_4678.JPG')
reduced_infer_image_frcnn = cv2.resize(infer_image, REDU_IMAGE_SIZE_FRCNN, interpolation=cv2.INTER_AREA)
cv2.imwrite('/workspace/install/python/IMG_4678_resized.JPG', reduced_infer_image_frcnn)

im = np.array(Image.open('/workspace/install/python/IMG_4678_resized.JPG'))

# Initialize the data
inputs[0].set_data_from_numpy(im)

outputs.append(tritongrpcclient.InferRequestedOutput('detection_boxes'))
outputs.append(tritongrpcclient.InferRequestedOutput('detection_classes'))
outputs.append(tritongrpcclient.InferRequestedOutput('detection_features'))
outputs.append(tritongrpcclient.InferRequestedOutput('detection_multiclass_scores'))
outputs.append(tritongrpcclient.InferRequestedOutput('detection_scores'))
outputs.append(tritongrpcclient.InferRequestedOutput('num_detections'))
outputs.append(tritongrpcclient.InferRequestedOutput('raw_detection_boxes'))
outputs.append(tritongrpcclient.InferRequestedOutput('raw_detection_scores'))

# Test with outputs
results = triton_client.infer(model_name=model_name,
                              inputs=inputs,
                              outputs=outputs,
                              headers={'test': '1'})

statistics = triton_client.get_inference_statistics(model_name=model_name)
print(statistics)
if len(statistics.model_stats) != 1:
    print("FAILED: Inference Statistics")
    sys.exit(1)

# Get the output arrays from the results
detection_boxes_output = results.as_numpy('detection_boxes')
detection_classes_output = results.as_numpy('detection_classes')
detection_features_output = results.as_numpy('detection_features')
detection_multiclass_scores_output = results.as_numpy('detection_multiclass_scores')
detection_scores_output = results.as_numpy('detection_scores')
num_detections_output = results.as_numpy('num_detections')
raw_detection_boxes_output = results.as_numpy('raw_detection_boxes')
raw_detection_scores_output = results.as_numpy('raw_detection_scores')

# Test with no outputs
results = triton_client.infer(model_name=model_name,
                              inputs=inputs,
                              outputs=None)

# Get the output arrays from the results
output0_data = results.as_numpy('OUTPUT0')
output1_data = results.as_numpy('OUTPUT1')


**Expected behavior**
I expect the client to send inference request and receive back inference results of Object detected Class name, bounding boxes, confidence and so on.
deadeyegoodwin commented 4 years ago

Unless I missed it I don't see where you say what went wrong. Your config.pbtxt is a little strange in that it specifies 5 instances of the model... nothing wrong with that but there is usually not a need to have 5 model instances. Also, the model.savedmodel directory usually (always?) has a variables sub-directory.

adfayed commented 4 years ago

@deadeyegoodwin My model.savedmodel, "faster_rcnn_inception_resnet_v2", custom-retrained does not generate any files in its variables sub-directory so I omitted it. The 5 instances was just to test max inference throughput.

You are right, my mistake, I did not specify the error I'm getting. Here it is: With the 20.03.1-py3 container server running on this command (ingesting that config.pbtxt):

tritonserver --strict-model-config=True --log-verbose=1 --model-repository=$(MODEL_REPO) \ 
--api-version=1 --allow-http=True --http-port=8080 --allow-grpc=True --grpc-port=8082 \ 
--grpc-infer-allocation-pool-size=0  --trace-file=/tmp/tritonserver_trace.json --trace-rate=1000 \ 
--trace-level=MAX --tf-allow-soft-placement=True

I run frcnn_grpc_v2_triton_infer_client.py in the 20.03.1-py3-clientsdk container:

python3 /workspace/install/python/frcnn_grpc_v2_triton_infer_client.py

I get:

Traceback (most recent call last):
File "/workspace/install/python/frcnn_grpc_v2_triton_infer_client.py", line 115, in <module>
headers={'test': '1'})
File "/usr/local/lib/python3.6/dist-packages/tritongrpcclient/__init__.py", line 940, in infer
raise_error_grpc(rpc_error)
File "/usr/local/lib/python3.6/dist-packages/tritongrpcclient/__init__.py", line 49, in raise_error
grpc
 raise get_error_grpc(rpc_error) from None
tritonclientutils.utils.InferenceServerException: [StatusCode.UNIMPLEMENTED] 

If I change the --grpc-port I get:

tritonclientutils.utils.InferenceServerException: [StatusCode.UNAVAILABLE] Connection reset by peer

So obviously some form of communication is happening but I am not getting my (inferred results) message back from the server to the client.

deadeyegoodwin commented 4 years ago

From the naming it looks like you are running a V2 client (that is a client that is using the V2 grpc/http protocols) but you launched the server with --api-version=1. Try --api-version=2.

adfayed commented 4 years ago

@deadeyegoodwin Good catch! The 20.03 has both the V1 and V2 protocol. Trying --api-version=2 I get an input mismatch:

tritonclientutils.utils.InferenceServerException: [StatusCode.INVALID_ARGUMENT] unexpected 
shape for input 'inputs' for model 'frcnn_incep_resnet_v2'. Expected [-1,-1,-1,3], got [640,960,3]

even though that is the input reported from saved_model_cli (as you can see in the main issue details above).

I train with images resized down by TensorFlow to [640, 960] and then I resize using cv2 to make the client inference request.

deadeyegoodwin commented 4 years ago

The batch dimension is handled specially in the config file. Since you have a saved-model I would let triton generate the initial config.pbtxt for you and then you can copy it an add whatever enhancements you want. See https://docs.nvidia.com/deeplearning/triton-inference-server/master-user-guide/docs/model_configuration.html#generated-model-configuration

adfayed commented 4 years ago

@deadeyegoodwin Got it. Allowed triton to generate its own config.pbtxt by setting --strict-model-config=false and then read the config file by using the endpoint metadata port, here it is:

name: "inference:0"
version: "1.13.0"
extensions: "classification"
extensions: "sequence"
extensions: "model_repository"
extensions: "schedule_policy"
extensions: "model_configuration"
extensions: "system_shared_memory"
extensions: "cuda_shared_memory"
extensions: "binary_tensor_data"
extensions: "statistics"

name: "frcnn_incep_resnet_v2"
versions: "5001"
platform: "tensorflow_savedmodel"
inputs {
  name: "inputs"
  datatype: "UINT8"
  shape: -1
  shape: -1
  shape: 3
}
outputs {
  name: "detection_boxes"
  datatype: "FP32"
  shape: 3
  shape: 4
}
outputs {
  name: "num_detections"
  datatype: "FP32"
  shape: 1
}
outputs {
  name: "detection_classes"
  datatype: "FP32"
  shape: 3
}
outputs {
  name: "detection_multiclass_scores"
  datatype: "FP32"
  shape: 3
  shape: 4
}
outputs {
  name: "raw_detection_scores"
  datatype: "FP32"
  shape: 9
  shape: 4
}
outputs {
  name: "detection_scores"
  datatype: "FP32"
  shape: 3
}
outputs {
  name: "detection_features"
  datatype: "FP32"
  shape: -1
  shape: -1
  shape: -1
  shape: -1
}
outputs {
  name: "raw_detection_boxes"
  datatype: "FP32"
  shape: 9
  shape: 4
}

I then to tweaked the frcnn_grpc_v2_triton_infer_client.py script you saw above to use the triton auto generated config file's input dims of [-1, -1, 3] Here is the new frcnn_grpc_v2_triton_infer_client.py (for completeness sake):

#!/usr/bin/env python

import argparse
import numpy as np
import sys
import cv2
from PIL import Image
import PIL.Image
import PIL.ImageDraw
import PIL.ImageFont
import pdb

import tritongrpcclient

REDU_WIDTH_FRCNN = 960
REDU_HEIGHT_FRCNN = 640
REDU_WIDTH_SSD = 960
REDU_HEIGHT_SSD = 640
ORIG_WIDTH = 1920
ORIG_HEIGHT = 1280
ORIG_IMAGE_SIZE = (ORIG_WIDTH, ORIG_HEIGHT)
REDU_IMAGE_SIZE_FRCNN = (REDU_WIDTH_FRCNN, REDU_HEIGHT_FRCNN)
REDU_IMAGE_SIZE_SSD = (REDU_WIDTH_SSD, REDU_HEIGHT_SSD)
MIN_SCORE_THRESHOLD = 0.10

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('-v',
                        '--verbose',
                        action="store_true",
                        required=False,
                        default=False,
                        help='Enable verbose output')
    parser.add_argument('-u',
                        '--url',
                        type=str,
                        required=False,
                        default='localhost:8082',
                        help='Inference server URL. Default is localhost:8001.')

    FLAGS = parser.parse_args()
    try:
        triton_client = tritongrpcclient.InferenceServerClient(url=FLAGS.url,
                                                         verbose=FLAGS.verbose)
    except Exception as e:
        print("channel creation failed: " + str(e))
        sys.exit()

    model_name = 'frcnn_incep_resnet_v2'

    # Infer
    inputs = []
    outputs = []
    inputs.append(tritongrpcclient.InferInput('inputs', [-1, -1, 3], "UINT8"))
    #inputs.append(tritongrpcclient.InferInput('inputs', [640, 960, 3], "UINT8"))

    # Obtain the data for the input tensors.
    infer_image = cv2.imread('/workspace/install/python/IMG_4678.JPG')
    reduced_infer_image_frcnn = cv2.resize(infer_image, REDU_IMAGE_SIZE_FRCNN, interpolation=cv2.INTER_AREA)
    cv2.imwrite('/workspace/install/python/IMG_4678_resized.JPG', reduced_infer_image_frcnn)

    im = np.array(Image.open('/workspace/install/python/IMG_4678_resized.JPG'))

    # Initialize the data
    inputs[0].set_data_from_numpy(im)

    outputs.append(tritongrpcclient.InferRequestedOutput('detection_boxes'))
    outputs.append(tritongrpcclient.InferRequestedOutput('detection_classes'))
    outputs.append(tritongrpcclient.InferRequestedOutput('detection_features'))
    outputs.append(tritongrpcclient.InferRequestedOutput('detection_multiclass_scores'))
    outputs.append(tritongrpcclient.InferRequestedOutput('detection_scores'))
    outputs.append(tritongrpcclient.InferRequestedOutput('num_detections'))
    outputs.append(tritongrpcclient.InferRequestedOutput('raw_detection_boxes'))
    outputs.append(tritongrpcclient.InferRequestedOutput('raw_detection_scores'))

    # Test with outputs
    results = triton_client.infer(model_name=model_name,
                                  inputs=inputs,
                                  outputs=outputs,
                                  headers={'test': '1'})

    statistics = triton_client.get_inference_statistics(model_name=model_name)
    print(statistics)
    if len(statistics.model_stats) != 1:
        print("FAILED: Inference Statistics")
        sys.exit(1)

    # Get the output arrays from the results
    detection_boxes_output = results.as_numpy('detection_boxes')
    detection_classes_output = results.as_numpy('detection_classes')
    detection_features_output = results.as_numpy('detection_features')
    detection_multiclass_scores_output = results.as_numpy('detection_multiclass_scores')
    detection_scores_output = results.as_numpy('detection_scores')
    num_detections_output = results.as_numpy('num_detections')
    raw_detection_boxes_output = results.as_numpy('raw_detection_boxes')
    raw_detection_scores_output = results.as_numpy('raw_detection_scores')

    # Test with no outputs
    results = triton_client.infer(model_name=model_name,
                                  inputs=inputs,
                                  outputs=None)

    # Get the output arrays from the results
    output0_data = results.as_numpy('OUTPUT0')
    output1_data = results.as_numpy('OUTPUT1')

The error code I get when sending the grpc message for inference is:

tritonclientutils.utils.InferenceServerException: got unexpected 
numpy array shape [640, 960, 3], expected [-1, -1, 3]

If I copy the Triton auto generated config.pbtxt and deploy it as is like you suggested, I started getting errors along the lines of:

failed to load 'frcnn_incep_resnet_v2' version 5001: Invalid argument: model 'frcnn_incep_resnet_v2', 
tensor 'inputs': the model expects 4 dimensions (shape [-1,-1,-1,3]) but the 
model configuration specifies 3 dimensions (shape [-1,-1,3])

Since the model retrained on TensorFlow is not of that shape that Triton is inferring. Or if I tweak only the input back to dims: [-1, -1, -1, 3] I get:

failed to load 'frcnn_incep_resnet_v2' version 5001: Invalid argument: model 'frcnn_incep_resnet_v2', 
tensor 'detection_boxes': the model expects 3 dimensions (shape [-1,3,4]) but the 
model configuration specifies 2 dimensions (shape [3,4])

One by one the outputs will trigger dimension mismatch errors.

deadeyegoodwin commented 4 years ago

In the model configuration you can use dimensions of -1 to indicate that the dimension can take on any value. But for a given inference request the input has a specific size, and in the client you must indicate that specific size for the input (and the data you provide must match that specific size). You should also read this section carefully: https://docs.nvidia.com/deeplearning/triton-inference-server/master-user-guide/docs/model_configuration.html#inputs-and-outputs

deadeyegoodwin commented 4 years ago

Closing, reopen if issue continues.

adfayed commented 4 years ago

Yes @deadeyegoodwin I missed essential details on configuration vs client outbound message formats. The gRPC message was sent successfully. Still need to figure out how to batch messages together and/or enable Dynamic batching for video inference (continuous stream of frames). If you can link me to the right direction. Thanks

deadeyegoodwin commented 4 years ago

You could look at deepstream, which is designed for processing video. The latest release has some Triton integration for inferencing. https://docs.nvidia.com/metropolis/deepstream/dev-guide/index.html