Closed adfayed closed 4 years ago
Unless I missed it I don't see where you say what went wrong. Your config.pbtxt is a little strange in that it specifies 5 instances of the model... nothing wrong with that but there is usually not a need to have 5 model instances. Also, the model.savedmodel directory usually (always?) has a variables sub-directory.
@deadeyegoodwin My model.savedmodel, "faster_rcnn_inception_resnet_v2"
, custom-retrained does not generate any files in its variables sub-directory so I omitted it. The 5 instances was just to test max inference throughput.
You are right, my mistake, I did not specify the error I'm getting. Here it is:
With the 20.03.1-py3 container
server running on this command (ingesting that config.pbtxt):
tritonserver --strict-model-config=True --log-verbose=1 --model-repository=$(MODEL_REPO) \
--api-version=1 --allow-http=True --http-port=8080 --allow-grpc=True --grpc-port=8082 \
--grpc-infer-allocation-pool-size=0 --trace-file=/tmp/tritonserver_trace.json --trace-rate=1000 \
--trace-level=MAX --tf-allow-soft-placement=True
I run frcnn_grpc_v2_triton_infer_client.py
in the 20.03.1-py3-clientsdk
container:
python3 /workspace/install/python/frcnn_grpc_v2_triton_infer_client.py
I get:
Traceback (most recent call last):
File "/workspace/install/python/frcnn_grpc_v2_triton_infer_client.py", line 115, in <module>
headers={'test': '1'})
File "/usr/local/lib/python3.6/dist-packages/tritongrpcclient/__init__.py", line 940, in infer
raise_error_grpc(rpc_error)
File "/usr/local/lib/python3.6/dist-packages/tritongrpcclient/__init__.py", line 49, in raise_error
grpc
raise get_error_grpc(rpc_error) from None
tritonclientutils.utils.InferenceServerException: [StatusCode.UNIMPLEMENTED]
If I change the --grpc-port I get:
tritonclientutils.utils.InferenceServerException: [StatusCode.UNAVAILABLE] Connection reset by peer
So obviously some form of communication is happening but I am not getting my (inferred results) message back from the server to the client.
From the naming it looks like you are running a V2 client (that is a client that is using the V2 grpc/http protocols) but you launched the server with --api-version=1. Try --api-version=2.
@deadeyegoodwin Good catch! The 20.03 has both the V1 and V2 protocol. Trying --api-version=2 I get an input mismatch:
tritonclientutils.utils.InferenceServerException: [StatusCode.INVALID_ARGUMENT] unexpected
shape for input 'inputs' for model 'frcnn_incep_resnet_v2'. Expected [-1,-1,-1,3], got [640,960,3]
even though that is the input reported from saved_model_cli (as you can see in the main issue details above).
I train with images resized down by TensorFlow to [640, 960] and then I resize using cv2 to make the client inference request.
The batch dimension is handled specially in the config file. Since you have a saved-model I would let triton generate the initial config.pbtxt for you and then you can copy it an add whatever enhancements you want. See https://docs.nvidia.com/deeplearning/triton-inference-server/master-user-guide/docs/model_configuration.html#generated-model-configuration
@deadeyegoodwin
Got it. Allowed triton to generate its own config.pbtxt by setting --strict-model-config=false
and then read the config file by using the endpoint metadata port, here it is:
name: "inference:0"
version: "1.13.0"
extensions: "classification"
extensions: "sequence"
extensions: "model_repository"
extensions: "schedule_policy"
extensions: "model_configuration"
extensions: "system_shared_memory"
extensions: "cuda_shared_memory"
extensions: "binary_tensor_data"
extensions: "statistics"
name: "frcnn_incep_resnet_v2"
versions: "5001"
platform: "tensorflow_savedmodel"
inputs {
name: "inputs"
datatype: "UINT8"
shape: -1
shape: -1
shape: 3
}
outputs {
name: "detection_boxes"
datatype: "FP32"
shape: 3
shape: 4
}
outputs {
name: "num_detections"
datatype: "FP32"
shape: 1
}
outputs {
name: "detection_classes"
datatype: "FP32"
shape: 3
}
outputs {
name: "detection_multiclass_scores"
datatype: "FP32"
shape: 3
shape: 4
}
outputs {
name: "raw_detection_scores"
datatype: "FP32"
shape: 9
shape: 4
}
outputs {
name: "detection_scores"
datatype: "FP32"
shape: 3
}
outputs {
name: "detection_features"
datatype: "FP32"
shape: -1
shape: -1
shape: -1
shape: -1
}
outputs {
name: "raw_detection_boxes"
datatype: "FP32"
shape: 9
shape: 4
}
I then to tweaked the frcnn_grpc_v2_triton_infer_client.py
script you saw above to use the triton auto generated config file's input dims of [-1, -1, 3]
Here is the new frcnn_grpc_v2_triton_infer_client.py
(for completeness sake):
#!/usr/bin/env python
import argparse
import numpy as np
import sys
import cv2
from PIL import Image
import PIL.Image
import PIL.ImageDraw
import PIL.ImageFont
import pdb
import tritongrpcclient
REDU_WIDTH_FRCNN = 960
REDU_HEIGHT_FRCNN = 640
REDU_WIDTH_SSD = 960
REDU_HEIGHT_SSD = 640
ORIG_WIDTH = 1920
ORIG_HEIGHT = 1280
ORIG_IMAGE_SIZE = (ORIG_WIDTH, ORIG_HEIGHT)
REDU_IMAGE_SIZE_FRCNN = (REDU_WIDTH_FRCNN, REDU_HEIGHT_FRCNN)
REDU_IMAGE_SIZE_SSD = (REDU_WIDTH_SSD, REDU_HEIGHT_SSD)
MIN_SCORE_THRESHOLD = 0.10
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('-v',
'--verbose',
action="store_true",
required=False,
default=False,
help='Enable verbose output')
parser.add_argument('-u',
'--url',
type=str,
required=False,
default='localhost:8082',
help='Inference server URL. Default is localhost:8001.')
FLAGS = parser.parse_args()
try:
triton_client = tritongrpcclient.InferenceServerClient(url=FLAGS.url,
verbose=FLAGS.verbose)
except Exception as e:
print("channel creation failed: " + str(e))
sys.exit()
model_name = 'frcnn_incep_resnet_v2'
# Infer
inputs = []
outputs = []
inputs.append(tritongrpcclient.InferInput('inputs', [-1, -1, 3], "UINT8"))
#inputs.append(tritongrpcclient.InferInput('inputs', [640, 960, 3], "UINT8"))
# Obtain the data for the input tensors.
infer_image = cv2.imread('/workspace/install/python/IMG_4678.JPG')
reduced_infer_image_frcnn = cv2.resize(infer_image, REDU_IMAGE_SIZE_FRCNN, interpolation=cv2.INTER_AREA)
cv2.imwrite('/workspace/install/python/IMG_4678_resized.JPG', reduced_infer_image_frcnn)
im = np.array(Image.open('/workspace/install/python/IMG_4678_resized.JPG'))
# Initialize the data
inputs[0].set_data_from_numpy(im)
outputs.append(tritongrpcclient.InferRequestedOutput('detection_boxes'))
outputs.append(tritongrpcclient.InferRequestedOutput('detection_classes'))
outputs.append(tritongrpcclient.InferRequestedOutput('detection_features'))
outputs.append(tritongrpcclient.InferRequestedOutput('detection_multiclass_scores'))
outputs.append(tritongrpcclient.InferRequestedOutput('detection_scores'))
outputs.append(tritongrpcclient.InferRequestedOutput('num_detections'))
outputs.append(tritongrpcclient.InferRequestedOutput('raw_detection_boxes'))
outputs.append(tritongrpcclient.InferRequestedOutput('raw_detection_scores'))
# Test with outputs
results = triton_client.infer(model_name=model_name,
inputs=inputs,
outputs=outputs,
headers={'test': '1'})
statistics = triton_client.get_inference_statistics(model_name=model_name)
print(statistics)
if len(statistics.model_stats) != 1:
print("FAILED: Inference Statistics")
sys.exit(1)
# Get the output arrays from the results
detection_boxes_output = results.as_numpy('detection_boxes')
detection_classes_output = results.as_numpy('detection_classes')
detection_features_output = results.as_numpy('detection_features')
detection_multiclass_scores_output = results.as_numpy('detection_multiclass_scores')
detection_scores_output = results.as_numpy('detection_scores')
num_detections_output = results.as_numpy('num_detections')
raw_detection_boxes_output = results.as_numpy('raw_detection_boxes')
raw_detection_scores_output = results.as_numpy('raw_detection_scores')
# Test with no outputs
results = triton_client.infer(model_name=model_name,
inputs=inputs,
outputs=None)
# Get the output arrays from the results
output0_data = results.as_numpy('OUTPUT0')
output1_data = results.as_numpy('OUTPUT1')
The error code I get when sending the grpc message for inference is:
tritonclientutils.utils.InferenceServerException: got unexpected
numpy array shape [640, 960, 3], expected [-1, -1, 3]
If I copy the Triton auto generated config.pbtxt and deploy it as is like you suggested, I started getting errors along the lines of:
failed to load 'frcnn_incep_resnet_v2' version 5001: Invalid argument: model 'frcnn_incep_resnet_v2',
tensor 'inputs': the model expects 4 dimensions (shape [-1,-1,-1,3]) but the
model configuration specifies 3 dimensions (shape [-1,-1,3])
Since the model retrained on TensorFlow is not of that shape that Triton is inferring.
Or if I tweak only the input back to dims: [-1, -1, -1, 3]
I get:
failed to load 'frcnn_incep_resnet_v2' version 5001: Invalid argument: model 'frcnn_incep_resnet_v2',
tensor 'detection_boxes': the model expects 3 dimensions (shape [-1,3,4]) but the
model configuration specifies 2 dimensions (shape [3,4])
One by one the outputs will trigger dimension mismatch errors.
In the model configuration you can use dimensions of -1 to indicate that the dimension can take on any value. But for a given inference request the input has a specific size, and in the client you must indicate that specific size for the input (and the data you provide must match that specific size). You should also read this section carefully: https://docs.nvidia.com/deeplearning/triton-inference-server/master-user-guide/docs/model_configuration.html#inputs-and-outputs
Closing, reopen if issue continues.
Yes @deadeyegoodwin I missed essential details on configuration vs client outbound message formats. The gRPC message was sent successfully. Still need to figure out how to batch messages together and/or enable Dynamic batching for video inference (continuous stream of frames). If you can link me to the right direction. Thanks
You could look at deepstream, which is designed for processing video. The latest release has some Triton integration for inferencing. https://docs.nvidia.com/metropolis/deepstream/dev-guide/index.html
Description Hello, okay so I am trying to do object detection inference on the GPU, RTX 2070 Super, of my own custom trained TensorFlow neural network based on the Faster Rcnn Inception Resnet v2 COCO (pre-trained) network provided in the Model Zoo. (here is the direct download link)
Triton Information What version of Triton are you using? nvcr.io/nvidia/tritonserver 20.03.1-py3 AND nvcr.io/nvidia/tritonserver 20.03.1-py3-clientsdk
Nvidia driver version nvidia-driver-440 | 440.100-0ubuntu0.18.04.1
Are you using the Triton container or did you build it yourself? I am using the container. versions 20.03.1-py3 and 20.03.1-py3-clientsdk
To Reproduce To walk you step by step on what I did (to replicate this):
Here is the config file "pipeline_tf1.15.config":
Exported the model into a frozen inference graph (savedmodel.pb) by running cmd:
Ran Nvidia Triton inference server container version nvcr.io/nvidia/tritonserver 20.03.1-py3 NGC docker pull command here by cmd:
saved_model_cli shows this:
signature_def['serving_default']: The given SavedModel SignatureDef contains the following input(s): inputs['inputs'] tensor_info: dtype: DT_UINT8 shape: (-1, -1, -1, 3) name: image_tensor:0 The given SavedModel SignatureDef contains the following output(s): outputs['detection_boxes'] tensor_info: dtype: DT_FLOAT shape: (-1, 3, 4) name: detection_boxes:0 outputs['detection_classes'] tensor_info: dtype: DT_FLOAT shape: (-1, 3) name: detection_classes:0 outputs['detection_features'] tensor_info: dtype: DT_FLOAT shape: (-1, -1, -1, -1, -1) name: detection_features:0 outputs['detection_multiclass_scores'] tensor_info: dtype: DT_FLOAT shape: (-1, 3, 4) name: detection_multiclass_scores:0 outputs['detection_scores'] tensor_info: dtype: DT_FLOAT shape: (-1, 3) name: detection_scores:0 outputs['num_detections'] tensor_info: dtype: DT_FLOAT shape: (-1) name: num_detections:0 outputs['raw_detection_boxes'] tensor_info: dtype: DT_FLOAT shape: (-1, 9, 4) name: raw_detection_boxes:0 outputs['raw_detection_scores'] tensor_info: dtype: DT_FLOAT shape: (-1, 9, 4) name: raw_detection_scores:0 Method name is: tensorflow/serving/predict
$ python3 trt_model_converter.py
from tensorflow.python.compiler.tensorrt import trt_convert as trt
converter = trt.TrtGraphConverter(input_saved_model_dir='/triton-inference-server/docs/examples/my_models/frcnn_incep_resnet_v2/5000/model.savedmodel', minimum_segment_size=3,is_dynamic_op=True,maximum_cached_engines=5,precision_mode="FP16") # max_workspace_size_bytes=4000000000
converted = converter.convert()
converter.save(output_saved_model_dir='/triton-inference-server/docs/examples/my_models/frcnn_incep_resnet_v2/5000_FP16')
$ export MODEL_REPO="/triton-inference-server/docs/examples/my_models"
root@aa94996e622f:/triton-inference-server/docs/examples/my_models# $ lsar .: total 4 drwxr-xr-x 3 root root 4096 Jul 3 23:45 frcnn_incep_resnet_v2
./frcnn_incep_resnet_v2: total 20 drwxr-xr-x 3 root root 4096 Jun 24 21:25 5001 -rw-r--r-- 1 root root 110 Jun 26 23:38 ag_label_map.txt -rw-r--r-- 1 root root 1630 Jul 3 23:45 config.pbtxt -rw-r--r-- 1 root root 4599 Jun 19 05:09 pipeline_tf1.15.config
./frcnn_incep_resnet_v2/5001: total 4 drwxr-xr-x 2 root root 4096 Jun 24 21:25 model.savedmodel
./frcnn_incep_resnet_v2/5001/model.savedmodel: total 233560 -rw-r--r-- 1 root root 239161986 Jun 24 19:51 saved_model.pb
name: "frcnn_incep_resnet_v2" platform: "tensorflow_savedmodel" max_batch_size: 0 version_policy: { all { }} # "latest" Only the latest 'n' versions of the model in the repository
are available for inferencing. Or "specific" Only the specifically listed versions of the model are
available for inferencing.
input [ { name: "inputs" data_type: TYPE_UINT8 dims: [-1, -1, -1, 3 ] } ] output [ { name: "detection_boxes" data_type: TYPE_FP32 dims: [-1, 3, 4] # the 3 here represents max_total_detections in pipeline.config }, { name: "detection_classes" data_type: TYPE_FP32 dims: [-1, 3] # the 3 here represents max_total_detections in pipeline.config }, { name: "detection_features" data_type: TYPE_FP32 dims: [-1, -1, -1, -1, -1] }, { name: "detection_multiclass_scores" data_type: TYPE_FP32 dims: [-1, 3, 4] # the 3 here represents max_total_detections in pipeline.config }, { name: "detection_scores" data_type: TYPE_FP32 dims: [-1, 3] # the 3 here represents max_total_detections in pipeline.config }, { name: "num_detections" data_type: TYPE_FP32 dims: [-1] }, { name: "raw_detection_boxes" data_type: TYPE_FP32 dims: [-1, 9, 4] # the 9 here represents first_stage_max_proposals in pipeline.config }, { name: "raw_detection_scores" data_type: TYPE_FP32 dims: [-1, 9, 4] # the 9 here represents first_stage_max_proposals in pipeline.config } ] instance_group [ { count: 5 kind: KIND_GPU gpus: [ 0 ] } ]
$ tritonserver --strict-model-config=True --log-verbose=1 --model-repository=$(MODEL_REPO) \ --api-version=1 --allow-http=True --http-port=8080 --allow-grpc=True --grpc-port=8082 \ --grpc-infer-allocation-pool-size=0 --trace-file=/tmp/tritonserver_trace.json --trace-rate=1000 \ --trace-level=MAX --tf-allow-soft-placement=True
$ sudo docker run -it --name=tritonserver_client --net=host nvcr.io/nvidia/tritonserver:20.03.1-py3-clientsdk
$ python3 frcnn_grpc_v2_triton_infer_client.py
import argparse import numpy as np import sys import cv2 from PIL import Image import PIL.Image import PIL.ImageDraw import PIL.ImageFont import pdb
import tritongrpcclient
REDU_WIDTH_FRCNN = 960 REDU_HEIGHT_FRCNN = 640 REDU_WIDTH_SSD = 960 REDU_HEIGHT_SSD = 640 ORIG_WIDTH = 1920 ORIG_HEIGHT = 1280 ORIG_IMAGE_SIZE = (ORIG_WIDTH, ORIG_HEIGHT) REDU_IMAGE_SIZE_FRCNN = (REDU_WIDTH_FRCNN, REDU_HEIGHT_FRCNN) REDU_IMAGE_SIZE_SSD = (REDU_WIDTH_SSD, REDU_HEIGHT_SSD) MIN_SCORE_THRESHOLD = 0.10
if name == 'main': parser = argparse.ArgumentParser() parser.add_argument('-v', '--verbose', action="store_true", required=False, default=False, help='Enable verbose output') parser.add_argument('-u', '--url', type=str, required=False, default='localhost:8082', help='Inference server URL. Default is localhost:8001.')