onnx / onnx-tensorrt

ONNX-TensorRT: TensorRT backend for ONNX
Apache License 2.0
2.85k stars 539 forks source link

Unsupported ONNX data type: UINT8 (2) #400

Open bnascimento opened 4 years ago

bnascimento commented 4 years ago

Folowing the tutorial from the notebook https://github.com/onnx/tensorflow-onnx/blob/master/tutorials/ConvertingSSDMobilenetToONNX.ipynb I am trying to work with a mobilenetv2 and v3 frozen models from tensorflow frozen_inference_graph.pb or a saved_model.pb to convert to ONNX and to TensorRT files. Under NGC dockers 20.01-tf1-py3 and 19.05-py3 I am using both this and tensorflow-onnx projects. I alwaysget different issues, the furthest I got was under 20.01-tf1-py3 with both onnx-tensorrt and tensorflow-onnx on master branchs and install the projects from source. I was able to create the .onnx file, but when I try to create the .trt file I get the following.

onnx2trt /media/bnascimento/project/frozen_inference_graph.onnx -o /media/bnascimento/project/frozen_inference_graph.trt
----------------------------------------------------------------
Input filename:   /media/bnascimento/project/frozen_inference_graph.onnx
ONNX IR version:  0.0.6
Opset version:    10
Producer name:    tf2onnx
Producer version: 1.6.0
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
Parsing model
Unsupported ONNX data type: UINT8 (2)
ERROR: image_tensor:0:190 In function importInput:
[8] Assertion failed: convertDtype(onnxDtype.elem_type(), &trtDtype)

I suspect this has to do with the input tensor for the image, but I dont know how to avoid this issue. Anyone with similar issues before?

Cheers Bruno

qraleq commented 4 years ago

@bnascimento I get the same error when parsing a model. Did you manage to resolve you issue?

aif2017 commented 4 years ago

Input filename: model.onnx
ONNX IR version: 0.0.6 Opset version: 11
Producer name: tf2onnx Producer version: 1.5.5
Domain: Model version: 0
Doc string: ----------------------------------------------------------------

Writing ONNX model (without weights) as text to my_engine.txt Parsing model Unsupported ONNX data type: UINT8 (2) ERROR: image_tensor:0:190 In function importInput: [8] Assertion failed: convertDtype(onnxDtype.elem_type(), &trtDtype)

aif2017 commented 4 years ago

Folowing the tutorial from the notebook https://github.com/onnx/tensorflow-onnx/blob/master/tutorials/ConvertingSSDMobilenetToONNX.ipynb I am trying to work with a mobilenetv2 and v3 frozen models from tensorflow frozen_inference_graph.pb or a saved_model.pb to convert to ONNX and to TensorRT files. Under NGC dockers 20.01-tf1-py3 and 19.05-py3 I am using both this and tensorflow-onnx projects. I alwaysget different issues, the furthest I got was under 20.01-tf1-py3 with both onnx-tensorrt and tensorflow-onnx on master branchs and install the projects from source. I was able to create the .onnx file, but when I try to create the .trt file I get the following.

onnx2trt /media/bnascimento/project/frozen_inference_graph.onnx -o /media/bnascimento/project/frozen_inference_graph.trt
----------------------------------------------------------------
Input filename:   /media/bnascimento/project/frozen_inference_graph.onnx
ONNX IR version:  0.0.6
Opset version:    10
Producer name:    tf2onnx
Producer version: 1.6.0
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
Parsing model
Unsupported ONNX data type: UINT8 (2)
ERROR: image_tensor:0:190 In function importInput:
[8] Assertion failed: convertDtype(onnxDtype.elem_type(), &trtDtype)

I suspect this has to do with the input tensor for the image, but I dont know how to avoid this issue. Anyone with similar issues before?

Cheers Bruno

do you found way to escape this?

aif2017 commented 4 years ago

????????????????????????????

aif2017 commented 4 years ago

??????????????????????????????????

chiehpower commented 4 years ago

TRT cannot support UINT8 datatype. It means your model already used the uint8 datatype. Check here: https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/python_api/infer/FoundationalTypes/DataType.html

ai-vip2020 commented 4 years ago

TRT cannot support UINT8 datatype. It means your model already used the uint8 datatype. Check here: https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/python_api/infer/FoundationalTypes/DataType.html

thanks, but this node is input image that in second step convert to float32!

ai-vip2020 commented 4 years ago

TRT cannot support UINT8 datatype. It means your model already used the uint8 datatype. Check here: https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/python_api/infer/FoundationalTypes/DataType.html

s

WhaSukGO commented 4 years ago

Any update?

hfinger commented 4 years ago

same problem. Any update?

turowicz commented 4 years ago

Same problem as here https://forums.developer.nvidia.com/t/unsupported-onnx-data-type-uint8-2/75044/10

Ram-Godavarthi commented 4 years ago

Any solutions to this problem??

WARNING: ONNX model has a newer ir_version (0.0.5) than this parser was built against (0.0.3). Unsupported ONNX data type: UINT8 (2) ERROR: ModelImporter.cpp:54 In function importInput: [8] Assertion failed: convert_dtype(onnx_tensor_type.elem_type(), &trt_dtype) [05/29/2020-10:13:46] [E] Failed to parse onnx file [05/29/2020-10:13:46] [E] Parsing model failed [05/29/2020-10:13:46] [E] Engine could not be created &&&& FAILED TensorRT.trtexec # ./trtexec --onnx=inception_standard.onnx

Guneetkaur03 commented 4 years ago

Hey! Even I have the same problem. Any Solutions? Unsupported ONNX data type: UINT8 (2) ERROR: batch:1:191 In function importInput: [8] Assertion failed: convertDtype(onnxDtype.elem_type(), &trtDtype) [06/29/2020-16:30:09] [E] Failed to parse onnx file [06/29/2020-16:30:09] [E] Parsing model failed [06/29/2020-16:30:09] [E] Engine creation failed [06/29/2020-16:30:09] [E] Engine set up failed &&&& FAILED TensorRT.trtexec # trtexec --onnx=/home/xyz/Downloads/train_batch_shape.onnx --shapes=input_3:1x200x200x3

pranavk2050 commented 3 years ago

Unsupported ONNX data type: UINT8 (2) Have anyone solved this ?

douglasrizzo commented 3 years ago

First, I have to say that I haven't had this janky experience with software in years. Working with this ONNX and TensorRT ecosystem is a complete nightmare.

Second, I was able to solve the UINT8 problem by using the code from this NVIDIA Developers forum post: https://forums.developer.nvidia.com/t/problem-converting-onnx-model-to-tensorrt-engine-for-ssd-mobilenet-v2/139337/16

This fixes the original frozen_inference_graph.pb file, which then needs to be converted to ONNX and then to TensorRT.

douglasrizzo commented 3 years ago

Here are the steps I did, but ended up failing anyway.

Step 1: fix UINT8 error

Here is a script that generates a new frozen inference graph with float inputs from one with int inputs:

Suppose it's called fix_uint8.py. Its usage is: python fix_uint8.py frozen_inference_graph.pb fixed_inference_graph.pb

import tensorflow as tf
import graphsurgeon as gs
import sys

graph = gs.DynamicGraph(sys.argv[1])
image_tensor = graph.find_nodes_by_name('image_tensor')

print('Found Input: ', image_tensor)

cast_node = graph.find_nodes_by_name('Cast')[0] #Replace Cast with ToFloat if using tensorflow <1.15
print('Old field', cast_node.attr['SrcT'])

cast_node.attr['SrcT'].type=1 #Changing Expected type to float
print('New field', cast_node.attr['SrcT'])

input_node = gs.create_plugin_node(name='InputNode', op='Placeholder', shape=(-1, -1, -1, 3), dtype=tf.float32)
namespace_plugin_map = {'image_tensor': input_node}
graph.collapse_namespaces(namespace_plugin_map)
graph.write(sys.argv[2])

Step 2: generate ONNX file from fixed .pb file

Let's say I fixed a file and called it mobilenet_v2_0.35_128.pb. I then call tf2onnx on this file:

python -m tf2onnx.convert --input mobilenet_v2_0.35_128.pb --inputs InputNode:0 --output mobilenet_v2_0.35_128.onnx --opset 11 --outputs detection_boxes:0,detection_scores:0,detection_multiclass_scores:0,detection_classes:0,num_detections:0,raw_detection_boxes:0,raw_detection_scores:0

2020-08-31 05:32:04,426 - INFO - Using tensorflow=1.15.0, onnx=1.7.0, tf2onnx=1.6.3/d4abc8
2020-08-31 05:32:04,426 - INFO - Using opset <onnx, 11>
2020-08-31 05:32:10,228 - INFO - Optimizing ONNX model
2020-08-31 05:32:28,812 - INFO - After optimization: BatchNormalization -53 (60->7), Cast -34 (131->97), Const -578 (916->338), Gather +6 (29->35), Identity -129 (130->1), Less -2 (10->8), Mul -2 (37->35), Reshape -15 (45->30), Shape -8 (33->25), Slice -7 (56->49), Squeeze -22 (73->51), Transpose -272 (291->19), Unsqueeze -63 (102->39)
2020-08-31 05:32:28,896 - INFO -
2020-08-31 05:32:28,896 - INFO - Successfully converted TensorFlow model mobilenet_v2_0.35_128.pb to ONNX
2020-08-31 05:32:28,925 - INFO - ONNX model is saved at mobilenet_v2_0.35_128.onnx

Step 3: generate TensorRT "engine" from ONNX file

Lastly, I call onnx2trt:

onnx2trt mobilenet_v2_0.35_128.onnx -o mobilenet_v2_0.35_128_engine.trt
----------------------------------------------------------------
Input filename:   mobilenet_v2_0.35_128.onnx
ONNX IR version:  0.0.6
Opset version:    11
Producer name:    tf2onnx
Producer version: 1.6.3
Domain:
Model version:    0
Doc string:
----------------------------------------------------------------
Parsing model
[2020-08-31 08:27:24 WARNING] [TRT]/home/user/Code/onnx-tensorrt/onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[2020-08-31 08:27:24 WARNING] [TRT]/home/user/Code/onnx-tensorrt/onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[2020-08-31 08:27:24 WARNING] [TRT]/home/user/Code/onnx-tensorrt/onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[2020-08-31 08:27:24   ERROR] INVALID_ARGUMENT: getPluginCreator could not find plugin NonMaxSuppression version 1
While parsing node number 306 [Loop -> "unused_loop_output___73"]:
ERROR: /home/user/Code/onnx-tensorrt/builtin_op_importers.cpp:3713 In function importFallbackPluginImporter:
[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"

I've trained my network using TF 1.15, but I get this error even when I execute these steps with either TF 2.3 or 1.15.

turowicz commented 3 years ago

@douglasrizzo which model are you training?

douglasrizzo commented 3 years ago

@turowicz I am training MobileNet v2 and v3 models from the TensorFlow Object Detection API. I get pre-trained models from here and train them on a custom dataset (for object detection, not classification).

The "non-max suppression" operation that seems to be giving trouble to TensorRT is specific to object detection tasks. It basically consists of removing multiple bounding boxes that may be predicted on top of the same object, returning only the one with highest confidence.

cognitiveRobot commented 3 years ago

@douglasrizzo did you find any solution? Can you please share? Thanks.

douglasrizzo commented 3 years ago

@cognitiveRobot I ditched TensorRT and the Jetson and did inference in an Intel NUC, directly in the CPU.

cognitiveRobot commented 3 years ago

@douglasrizzo thanks a lot. That could be a solution for us too.

How much FPS you get for your models on Intel NUC?

What is the size of your input images?

Can you please share these? It will really help me to make our final decisions.

douglasrizzo commented 3 years ago

@cognitiveRobot oh boy oh boy, do I have answers for you. I trained all MobileNetV2 and V3 models from this page with a width multiplier of 1 or less to detect a single class (soccer balls). I then collected the mean inference time for a single frame on a 30 second video, both in a Tesla V100 GPU and an Intel i5-4210U. You can see the results below.

The i5 is between 1.3 and 1.5 times slower than the V100, but you have to be aware that this depends a lot on the implementation. The TF Object Detection API is pretty fast for inference in CPUs. On the other hand, the official YOLOv4 has an inference time of 50 ms on the V100 and a whooping 5 seconds on our feeble CPU.

image

As for the inference time when processing images of different sizes:

Just bear in mind that the MobileNets already scale down images before processing them, so it may be a good idea for you to configure your camera/input feed to have low resolutions too. It should matter little for the network.

cognitiveRobot commented 3 years ago

@douglasrizzo thanks a lot again. It will be really helpful.

bnascimento commented 3 years ago

Here are the steps I did, but ended up failing anyway.

Step 1: fix UINT8 error

Here is a script that generates a new frozen inference graph with float inputs from one with int inputs:

Suppose it's called fix_uint8.py. Its usage is: python fix_uint8.py frozen_inference_graph.pb fixed_inference_graph.pb

import tensorflow as tf
import graphsurgeon as gs
import sys

graph = gs.DynamicGraph(sys.argv[1])
image_tensor = graph.find_nodes_by_name('image_tensor')

print('Found Input: ', image_tensor)

cast_node = graph.find_nodes_by_name('Cast')[0] #Replace Cast with ToFloat if using tensorflow <1.15
print('Old field', cast_node.attr['SrcT'])

cast_node.attr['SrcT'].type=1 #Changing Expected type to float
print('New field', cast_node.attr['SrcT'])

input_node = gs.create_plugin_node(name='InputNode', op='Placeholder', shape=(-1, -1, -1, 3), dtype=tf.float32)
namespace_plugin_map = {'image_tensor': input_node}
graph.collapse_namespaces(namespace_plugin_map)
graph.write(sys.argv[2])

Step 2: generate ONNX file from fixed .pb file

Let's say I fixed a file and called it mobilenet_v2_0.35_128.pb. I then call tf2onnx on this file:

python -m tf2onnx.convert --input mobilenet_v2_0.35_128.pb --inputs InputNode:0 --output mobilenet_v2_0.35_128.onnx --opset 11 --outputs detection_boxes:0,detection_scores:0,detection_multiclass_scores:0,detection_classes:0,num_detections:0,raw_detection_boxes:0,raw_detection_scores:0

2020-08-31 05:32:04,426 - INFO - Using tensorflow=1.15.0, onnx=1.7.0, tf2onnx=1.6.3/d4abc8
2020-08-31 05:32:04,426 - INFO - Using opset <onnx, 11>
2020-08-31 05:32:10,228 - INFO - Optimizing ONNX model
2020-08-31 05:32:28,812 - INFO - After optimization: BatchNormalization -53 (60->7), Cast -34 (131->97), Const -578 (916->338), Gather +6 (29->35), Identity -129 (130->1), Less -2 (10->8), Mul -2 (37->35), Reshape -15 (45->30), Shape -8 (33->25), Slice -7 (56->49), Squeeze -22 (73->51), Transpose -272 (291->19), Unsqueeze -63 (102->39)
2020-08-31 05:32:28,896 - INFO -
2020-08-31 05:32:28,896 - INFO - Successfully converted TensorFlow model mobilenet_v2_0.35_128.pb to ONNX
2020-08-31 05:32:28,925 - INFO - ONNX model is saved at mobilenet_v2_0.35_128.onnx

Step 3: generate TensorRT "engine" from ONNX file

Lastly, I call onnx2trt:

onnx2trt mobilenet_v2_0.35_128.onnx -o mobilenet_v2_0.35_128_engine.trt
----------------------------------------------------------------
Input filename:   mobilenet_v2_0.35_128.onnx
ONNX IR version:  0.0.6
Opset version:    11
Producer name:    tf2onnx
Producer version: 1.6.3
Domain:
Model version:    0
Doc string:
----------------------------------------------------------------
Parsing model
[2020-08-31 08:27:24 WARNING] [TRT]/home/user/Code/onnx-tensorrt/onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[2020-08-31 08:27:24 WARNING] [TRT]/home/user/Code/onnx-tensorrt/onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[2020-08-31 08:27:24 WARNING] [TRT]/home/user/Code/onnx-tensorrt/onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[2020-08-31 08:27:24   ERROR] INVALID_ARGUMENT: getPluginCreator could not find plugin NonMaxSuppression version 1
While parsing node number 306 [Loop -> "unused_loop_output___73"]:
ERROR: /home/user/Code/onnx-tensorrt/builtin_op_importers.cpp:3713 In function importFallbackPluginImporter:
[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"

I've trained my network using TF 1.15, but I get this error even when I execute these steps with either TF 2.3 or 1.15.

INVALID_ARGUMENT: getPluginCreator could not find plugin NonMaxSuppression version 1

Anyone else stumbled into this issue? We also need a solution.

douglasrizzo commented 3 years ago

@bnascimento non-max suppression is an "operation" that seems to be implemented in TensorFlow and ONNX, but not in TensorRT, so converting any model that uses non-max suppression in its architecture to TensorRT is going to fail.

I believe the solution would be to implement it in TensorRT...

bnascimento commented 3 years ago

Hi @douglasrizzo , I've been looking into this issue and it seems that tensorRT has this operation under its plugins. See https://github.com/NVIDIA/TensorRT/tree/master/plugin/batchedNMSPlugin or https://github.com/NVIDIA/TensorRT/tree/master/plugin/nmsPlugin The reason might be because they are very specific operations, mostly used on object detection for example.

There are other people with similar issue as this, that have layed out different approachs, but so far I've been unsucessful. See https://github.com/NVIDIA/TensorRT/issues/795

qin2294096 commented 3 years ago

@bnascimento Try to split the tensorflow graph at the position before nms. Then you will get two graphs of: 'network_forward' and 'postpreprocess'. And just converting the 'network_forward ' part to TensorRT.

This link might be helpful. realtime_object_detection

absolution747 commented 3 years ago

Hey guys I too had this same problem and maybe this script can help as it helped me

import onnx

def change_input_datatype(model, typeNdx):

values for typeNdx

# 1 = float32
# 2 = uint8
# 3 = int8
# 4 = uint16
# 5 = int16
# 6 = int32
# 7 = int64
inputs = model.graph.input
for input in inputs:
    input.type.tensor_type.elem_type = typeNdx
    dtype = input.type.tensor_type.elem_type

def change_input_batchsize(model, batchSize): inputs = model.graph.input for input in inputs:
dim1 = input.type.tensor_type.shape.dim[0]
dim1.dim_value = batchSize

print("input: ", input) # uncomment to see input layer details

def change_output_batchsize(model, batchSize):
outputs = model.graph.output for output in outputs:
dim1 = output.type.tensor_type.shape.dim[0]
dim1.dim_value = batchSize

print("output: ", output) #uncomment to see output layer details

onnx_model = onnx.load()

change_input_datatype(onnx_model, 1) change_input_batchsize(onnx_model, 1) change_output_batchsize(onnx_model, 1)

onnx.save(onnx_model, )

Here we can change the data type of the input tensor. Resource: https://forums.developer.nvidia.com/t/unsupported-onnx-data-type-uint8-2/75044/16?u=karanprojectx

mihajenko commented 3 years ago

Very similar problem with the CumSum operator on a PyTorch RoBERTa implementation, exported at ONNX opsec 11:

import onnx
import onnxruntime
import onnx_tensorrt.backend as backend
model = onnx.load('/workspace/models/onnx-my-32.model')
engine = backend.prepare(model)
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 1115620964
[libprotobuf WARNING /workspace/TensorRT/build/third_party.protobuf/src/third_party.protobuf/src/google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING /workspace/TensorRT/build/third_party.protobuf/src/third_party.protobuf/src/google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 1115620964
[TensorRT] WARNING: /workspace/TensorRT/parsers/onnx/onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting
to cast down to INT32.
[TensorRT] ERROR: INVALID_ARGUMENT: getPluginCreator could not find plugin CumSum version 1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/workspace/TensorRT/parsers/onnx/onnx_tensorrt/backend.py", line 254, in prepare
    return TensorRTBackendRep(model, device, **kwargs)
  File "/workspace/TensorRT/parsers/onnx/onnx_tensorrt/backend.py", line 92, in __init__
    raise RuntimeError(msg)

The error: [TensorRT] ERROR: INVALID_ARGUMENT: getPluginCreator could not find plugin CumSum version 1

Reassigning elem_type like @absolution747 pointed out does not solve this, only removes the INT64 warning.

CMangoDH commented 3 years ago

I have the same error with my code. I find a tool can solve the problem here. I find the way here.

  1. Install ONNX Graphsurgeon API

    $ sudo apt-get install python3-pip libprotobuf-dev protobuf-compiler
    $ git clone https://github.com/NVIDIA/TensorRT.git
    $ cd TensorRT/tools/onnx-graphsurgeon/
    $ make install
  2. Modify your model

    
    import onnx_graphsurgeon as gs
    import onnx
    import numpy as np

graph = gs.import_onnx(onnx.load("model.onnx")) for inp in graph.inputs: inp.dtype = np.float32

onnx.save(gs.export_onnx(graph), "updated_model.onnx")

DJT777 commented 3 years ago

Has anyone been able to solve this issue:

[01/07/2021-14:03:44] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[01/07/2021-14:03:44] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[01/07/2021-14:03:44] [I] [TRT] ModelImporter.cpp:135: No importer registered for op: NonMaxSuppression. Attempting to import as plugin.
[01/07/2021-14:03:44] [I] [TRT] builtin_op_importers.cpp:3659: Searching for plugin: NonMaxSuppression, plugin_version: 1, plugin_namespace: 
[01/07/2021-14:03:44] [E] [TRT] INVALID_ARGUMENT: getPluginCreator could not find plugin NonMaxSuppression version 1
ERROR: builtin_op_importers.cpp:3661 In function importFallbackPluginImporter:
[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"
[01/07/2021-14:03:44] [E] Failed to parse onnx file
[01/07/2021-14:03:44] [E] Parsing model failed
[01/07/2021-14:03:44] [E] Engine creation failed
[01/07/2021-14:03:44] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec # ./trtexec --onnx=/media/AF68-D504/Tf2TRT/SSD_trained_alpha_trt/updated_SSD_tf2.onnx --saveEngine=/media/AF68-D504/Tf2TRT/SSD_train_alpha_trt/engine.trt

I understand that the .cpp file needs to be rewritten. If someone has solved the builtin importers problem, please share what code they have changed in that file.

turowicz commented 3 years ago

Would be nice if NVIDIA made this easier. Many people are using TF Object Detection. I'm trying to run it on Jetson.

cc @deadeyegoodwin

turowicz commented 3 years ago

Guys, perhaps the way is to train the model with FP16 weights from the get-go.

https://github.com/tensorflow/models/issues/3706

https://www.analyticsvidhya.com/blog/2020/09/tensorflow-object-detection-1-0-2-0-train-export-optimize-tensorrt-infer-jetson-nano/

These guys seem to be successful on paper.

t-T-s commented 3 years ago

I have the same error with my code. I find a tool can solve the problem here. I find the way here.

1. Install ONNX Graphsurgeon API
$ sudo apt-get install python3-pip libprotobuf-dev protobuf-compiler
$ git clone https://github.com/NVIDIA/TensorRT.git
$ cd TensorRT/tools/onnx-graphsurgeon/
$ make install
1. Modify your model
import onnx_graphsurgeon as gs
import onnx
import numpy as np

graph = gs.import_onnx(onnx.load("model.onnx"))
for inp in graph.inputs:
    inp.dtype = np.float32

onnx.save(gs.export_onnx(graph), "updated_model.onnx")

Wow this worked nicely !! Thanks a bunch @CMangoDH

Now I am having trouble with replacing the NMS (NonMaxSupression) layer... Is there a way to replace the NMS with the nms available in the tensorRT ?

quancq commented 3 years ago

Very similar problem with the CumSum operator on a PyTorch RoBERTa implementation, exported at ONNX opsec 11:

import onnx
import onnxruntime
import onnx_tensorrt.backend as backend
model = onnx.load('/workspace/models/onnx-my-32.model')
engine = backend.prepare(model)
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 1115620964
[libprotobuf WARNING /workspace/TensorRT/build/third_party.protobuf/src/third_party.protobuf/src/google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING /workspace/TensorRT/build/third_party.protobuf/src/third_party.protobuf/src/google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 1115620964
[TensorRT] WARNING: /workspace/TensorRT/parsers/onnx/onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting
to cast down to INT32.
[TensorRT] ERROR: INVALID_ARGUMENT: getPluginCreator could not find plugin CumSum version 1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/workspace/TensorRT/parsers/onnx/onnx_tensorrt/backend.py", line 254, in prepare
    return TensorRTBackendRep(model, device, **kwargs)
  File "/workspace/TensorRT/parsers/onnx/onnx_tensorrt/backend.py", line 92, in __init__
    raise RuntimeError(msg)

The error: [TensorRT] ERROR: INVALID_ARGUMENT: getPluginCreator could not find plugin CumSum version 1

Reassigning elem_type like @absolution747 pointed out does not solve this, only removes the INT64 warning.

I meet same problem. This is my solution. When call forward method of RoBERTa, if position_ids is None, HuggingFace will call create_position_ids_from_input_ids method modeling RoBERTa to generate position_ids. Inside this function, they use torch.cumsum method. In order to fix bug involve TensorRT not support convert CumSum operator, you need to generate position_ids.

kevinch-nv commented 3 years ago

Late to this thread, but it looks like there's a few issues:

  1. UINT8 support - we do not natively support this datatype in TensorRT. It looks like in the attached models above the input is casted to a different type right away, meaning that the potential WAR of just casting the input type to float may be the correct one in this case.
  2. NMS - we currently do not support the ONNX definition of this operator in TensorRT. We are working on getting an official implementation in.
  3. CumSum - this operator has been added and is available on the master branch of the onnx-tensorrt repo. Try building the on the latest commit and importing your model again
turowicz commented 3 years ago

@kevinch-nv thank you for taking part in this conversation. Is it safe to assume that one day we will be able to make the conversion and this workflow will work?

drewm1980 commented 3 years ago

+1 for natively supporting UINT8. It's really bizarre that the format used in almost all image source data is not supported.

simunovic-antonio commented 3 years ago

Regarding the UINT8 support: consider the performance aspect of casting the input from UIN8 to FLOAT32 on host before transferring to device. It will increase the transferred volume by 4 times.

Many models will convert the UIN8 to FLOAT32 for data normalization and it is more efficient to do it on device.

WangFengtu1996 commented 2 years ago

HI ,all

Any update ?

richjjj commented 2 years ago

There is a simple solution, just need onnx.

import onnx
from onnx import *
om = onnx.load("model.onnx")
grap = om.grap
new_input = helper.make_tensor_value_info("input_name",TensorProto.FLOAT,[n,c,h,w])
del grap.input[0]
grap.input.append(new_input)
onnx.save(om,"out.onnx")
dmenig commented 2 years ago

I don't get why UINT8 input data type isn't already a feature of TensorRT. This is what made my business switch to NUC as well.

SthPhoenix commented 2 years ago

Interesting fact, TensorRT won't let you build engine with uint8 input, BUT if you have created engine with fp32 input it accepts uint8 input without any warnings. Hope NVIDIA won't fix this bug until they implement official support for uint8 inputs.

dmenig commented 2 years ago

@SthPhoenix I can't make this work. I have only "inf" in my outputs when I feed uint8 to my tensorrt engine that has fp32 inputs.

Could you tell us more ?

SthPhoenix commented 2 years ago

I have model which first layers are Sub and Mul, which perform actual image normalization and conversion from uint8 to float32, looks like in this case TRT silently casts input values to float values like 0.0-255.0

dmenig commented 2 years ago

Oh right, my test isn't on a model with internalized normalization, I'll test your thing right away.

dmenig commented 2 years ago

I implemented what you describe with Batch Normalization, but couldn't get desired results. When you say your layer performs "conversion from uint8 to float32", does that mean I need to explicitely write a conversion layer (one that has a type casting operation, following the specific deep learning framework I use ? mine is pytorch)

dmenig commented 2 years ago

With a conversion layer, I couldn't make it work as well. This is my python TensorRT forward pass. Do you think there is a problem here ?

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import os
import sys
import numpy as np

class HostDeviceMem(object):
    def __init__(self, host_mem, device_mem):
        self.host = host_mem
        self.device = device_mem
    def __str__(self):
        return "Host:\n" + str(self.host) + "\nDevice:\n" + str(self.device)
    def __repr__(self):
        return self.__str__()

def allocate_buffers(engine):
    inputs = []
    outputs = []
    bindings = []
    stream = cuda.Stream()
    for binding in engine:
        size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
        dtype = trt.nptype(engine.get_binding_dtype(binding))
        # Allocate host and device buffers
        host_mem = cuda.pagelocked_empty(size, dtype)
        device_mem = cuda.mem_alloc(host_mem.nbytes)
        # Append the device buffer to device bindings.
        bindings.append(int(device_mem))
        # Append to the appropriate list.
        if engine.binding_is_input(binding):
            inputs.append(HostDeviceMem(host_mem, device_mem))
        else:
            outputs.append(HostDeviceMem(host_mem, device_mem))
    return inputs, outputs, bindings, stream

def get_engine(engine_file_path=""):
    """Attempts to load a serialized engine if available, otherwise builds a new TensorRT engine and saves it."""
    assert os.path.exists(engine_file_path), "Engine file doesn't exist"
    # If a serialized engine exists, use it instead of building an engine.
    print("Reading engine from file {}".format(engine_file_path))
    with open(engine_file_path, "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
        return runtime.deserialize_cuda_engine(f.read())

TRT_LOGGER = trt.Logger()

engine_file_path="resnet.trt"
with get_engine(
    engine_file_path
) as engine, engine.create_execution_context() as context:
    inputs, outputs, bindings, stream = allocate_buffers(engine)
    # Do inference
    # Set host input to the image. The do_inference function will copy the input to the GPU before executing.
    np.random.seed(17)
    inputs[0].host = np.array(
        np.random.choice(range(256), (1, 3, 35, 224, 224)),
        dtype=np.uint8,
    )
    [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
    context.execute_async_v2(bindings=bindings, stream_handle=stream.handle)
    [
        cuda.memcpy_dtoh_async(out.host, out.device, stream)
        for out in outputs
    ]
    stream.synchronize()
    print([out.host for out in outputs])

Do you have another forward pass mechanism that allows you to get the desired outputs ?

SthPhoenix commented 2 years ago

I implemented what you describe with Batch Normalization, but couldn't get desired results. When you say your layer performs "conversion from uint8 to float32", does that mean I need to explicitely write a conversion layer (one that has a type casting operation, following the specific deep learning framework I use ? mine is pytorch)

No, I have just added one line to pytorch forward method, just before any actual processing, in my case it was following: x = (x - 127.5) * (1/128)

dmenig commented 2 years ago

I can't reproduce this with TensorRT 8.0.3 :/ I always get nans. I guess they have corrected this bug.