Open laggui opened 1 year ago
we have a more generic way of supporting TRT plugins which does not require contrib op schema updates. see https://github.com/microsoft/onnxruntime/pull/13847 +@chilo-ms to provide more details.
Sweet, that went under my radar! Will wait for further details.
Hi laggui,
As George mentioned, please use the new generic way of supporting TRT plugins.
All you need to do is change the EfficientNMS_TRT node in the onnx model with the domain trt.plugins
(you can use python to easy modify the onnx node) instead of the default onnx domain which is null string. The old way of using TRT plugins is through contrib op and it checks domain kOnnxDomain
which is null string and as you already found out it doesn't add the new attribute that's why ORT failed to run the model.
Thanks for pointing out this issue and we can think of better ways to let people use the newer way of supporting TRT plugin. Let me know if you encounter any issues.
That seems pretty straightforward, thanks!
Just tried this very quickly and I am getting an error though. When I simply change the domain to "trt.plugins"
as below, I get
Error No opset import for domain 'trt.plugins'.
# Make node attributes
score_threshold = helper.make_attribute("score_threshold", score_thresh)
iou_threshold = helper.make_attribute("iou_threshold", iou_thresh)
max_output_boxes = helper.make_attribute("max_output_boxes", detections_per_img)
background_class = helper.make_attribute("background_class", background_class)
score_activation = helper.make_attribute("score_activation", 0) # False
box_coding = helper.make_attribute("box_coding", 0)
plugin_version = helper.make_attribute("plugin_version", "1")
class_agnostic = helper.make_attribute("class_agnostic", class_agnostic)
# Create NMS node
nms_node = helper.make_node(
"EfficientNMS_TRT",
inputs=inputs,
outputs=outputs,
name="NMS_Op",
domain="trt.plugins", # new TRT plugins
)
nms_node.attribute.extend(
[
background_class,
box_coding,
iou_threshold,
max_output_boxes,
plugin_version,
score_activation,
score_threshold,
class_agnostic,
]
)
I thought maybe it's because I forgot to export de ModelProto
with the trt.plugins
opset even if it wasn't mentioned anywhere
opsets = [helper.make_opsetid("", self.opset_version), helper.make_opsetid("trt.plugins", 1)]
helper.make_model(
graph,
opset_imports=opsets,
)
but that didn't seem to help. Instead I got the following error
[ONNXRuntimeError] : 1 : FAIL : Fatal error: trt.plugins: EfficientNMS_TRT(-1) is not a registered function/op
Gonna double check my node definition to make sure I didn't miss anything and will report back later.
@chilo-ms I've managed to narrow it down to a MWE for a graph with a single node. It works with the old way of using EfficientNMS_TRT via contrib ops, but not with the latest trt.plugins
.
I double checked that the EfficientNMS definition hasn't changed for TensorRT 8.6. Everything seems to be in accordance w.r.t the node definition (inputs, outputs and attributes).
Code
from typing import Sequence
import numpy as np
import onnxruntime as ort
from onnx import ModelProto, NodeProto, TensorProto, helper
def make_nms_node_trt(
inputs: Sequence[str],
outputs: Sequence[str],
score_thresh: float = 0.5,
iou_thresh: float = 0.45,
detections_per_img: int = 200,
background_class: int = -1,
class_agnostic: bool = False,
) -> NodeProto:
"""
Create EfficientNMS_TRT plugin node.
Inputs:
- boxes (float): [batch_size, num_boxes, 4]
- scores (float): [batch_size, num_boxes, num_classes]
Outputs:
- num_detections (int32): [batch_size, 1]
- detection_boxes (float): [batch_size, max_output_boxes, 4]
- detection_scores (float): [batch_size, max_output_boxes]
- detection_classes (int32): [batch_size, max_output_boxes]
"""
# Ref: https://github.com/NVIDIA/TensorRT/tree/main/plugin/efficientNMSPlugin
assert len(inputs) == 2, "EfficientNMS_TRT expects two inputs"
assert len(outputs) == 4, "EfficientNMS_TRT expects four outputs"
# Make node attributes
score_threshold = helper.make_attribute("score_threshold", score_thresh)
iou_threshold = helper.make_attribute("iou_threshold", iou_thresh)
max_output_boxes = helper.make_attribute("max_output_boxes", detections_per_img)
background_class = helper.make_attribute("background_class", background_class)
score_activation = helper.make_attribute("score_activation", 0) # False
box_coding = helper.make_attribute("box_coding", 0)
plugin_version = helper.make_attribute("plugin_version", "1")
# Create NMS node
nms_node = helper.make_node(
"EfficientNMS_TRT",
inputs=inputs,
outputs=outputs,
name="NMS_Op",
domain="trt.plugins" if class_agnostic else "", # new trt.plugins domain
)
nms_node.attribute.extend(
[
background_class,
box_coding,
iou_threshold,
max_output_boxes,
plugin_version,
score_activation,
score_threshold,
]
)
# Only add class_agnostic attribute w/ "trt.plugins" domain
if class_agnostic:
class_agnostic = helper.make_attribute("class_agnostic", class_agnostic)
nms_node.attribute.extend([class_agnostic])
return nms_node
def make_nms_model(
batch_size: int = 1,
num_boxes: int = 4,
num_classes: int = 2,
max_output_boxes: int = 200,
class_agnostic: bool = False,
) -> ModelProto:
"""Create ONNX model with EfficientNMS_TRT plugin node."""
# Inputs
boxes = helper.make_tensor_value_info("boxes", TensorProto.FLOAT, [1, num_boxes, 4])
scores = helper.make_tensor_value_info(
"scores", TensorProto.FLOAT, [batch_size, num_boxes, num_classes]
)
# Outputs
num_detections = helper.make_tensor_value_info(
"num_detections", TensorProto.INT32, [batch_size, 1]
)
detection_boxes = helper.make_tensor_value_info(
"detection_boxes", TensorProto.FLOAT, [batch_size, max_output_boxes, 4]
)
detection_scores = helper.make_tensor_value_info(
"detection_scores", TensorProto.FLOAT, [batch_size, max_output_boxes]
)
detection_classes = helper.make_tensor_value_info(
"detection_classes", TensorProto.INT32, [batch_size, max_output_boxes]
)
# ONNX graph
node = make_nms_node_trt(
[boxes.name, scores.name],
[
num_detections.name,
detection_boxes.name,
detection_scores.name,
detection_classes.name,
],
score_thresh=0.1,
iou_thresh=0.5,
detections_per_img=max_output_boxes,
class_agnostic=class_agnostic,
)
graph = helper.make_graph(
[node],
"NMS_Graph_TRT",
[boxes, scores],
[num_detections, detection_boxes, detection_scores, detection_classes],
)
# NOTE: To register "trt.plugins" opset domain or not?
m = helper.make_model(graph)
# m = helper.make_model(
# graph, opset_imports=[helper.make_opsetid("trt.plugins", 1)]
# )
return m
class ORTModel:
def __init__(
self,
model: ModelProto,
providers: Sequence[str] = (
"TensorrtExecutionProvider",
"CUDAExecutionProvider",
),
) -> None:
# Initialize ORT session
self.model = model
self.outputs = [
"num_detections",
"detection_boxes",
"detection_scores",
"detection_classes",
]
self.session = ort.InferenceSession(
self.model.SerializeToString(), providers=providers
)
def __call__(self, boxes: np.ndarray, scores: np.ndarray) -> Sequence[np.ndarray]:
ndet, nms_boxes, nms_scores, nms_classes = self.session.run(
self.outputs,
{"boxes": boxes, "scores": scores},
)
# NOTE: ndet = 2 if not class_agnostic else 1
print(f"Num. boxes: {ndet.item()}")
def test_efficient_nms(class_agnostic: bool = False) -> None:
# Boxes [1, 4, 4]
boxes = np.array(
[[[1, 3, 3, 0.95], [1, 3, 4, 0.93], [0.9, 3.6, 3, 0.98], [0.9, 3.5, 3, 0.97]]],
dtype=np.float32,
)
# Scores [1, 4, 2]
scores = np.array(
[
[
[0.80, 0.20],
[0.70, 0.30],
[0.40, 0.60],
[0.75, 0.25],
]
],
dtype=np.float32,
)
print("-" * 32)
op_domain = "trt.plugins" if class_agnostic else "contrib_ops"
print(f"EfficientNMS_TRT w/ {op_domain}")
model = make_nms_model(class_agnostic=class_agnostic)
model = ORTModel(model)
model(boxes, scores)
print("[OK] Inference test passed")
if __name__ == "__main__":
# Mute anything below error level (3)
ort.set_default_logger_severity(3)
# NOTE: uncomment `class_agnostic=False` to run the model w/ old contrib ops
# test_efficient_nms(class_agnostic=False)
test_efficient_nms(class_agnostic=True)
Console output w/ error
Old contrib ops model runs correctly
--------------------------------
EfficientNMS_TRT w/ contrib_ops
Num. boxes: 2
[OK] Inference test passed
New trt.plugins model crashes
--------------------------------
EfficientNMS_TRT w/ trt.plugins
Traceback (most recent call last):
File "/workspace/ort_inference/python/nms_mwe.py", line 193, in <module>
test_efficient_nms(class_agnostic=True)
File "/workspace/ort_inference/python/nms_mwe.py", line 184, in test_efficient_nms
model = ORTModel(model)
File "/workspace/ort_inference/python/nms_mwe.py", line 147, in __init__
self.session = ort.InferenceSession(
File "/home/prime/miniconda3/envs/ort115/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 383, in __init__
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "/home/prime/miniconda3/envs/ort115/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 426, in _create_inference_session
sess = C.InferenceSession(session_options, self._model_bytes, False, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. In Node, ("NMS_Op", EfficientNMS_TRT, "trt.plugins", -1) : ("boxes": tensor(float),"scores": tensor(float),) -> ("num_detections": tensor(int32),"detection_boxes": tensor(float),"detection_scores": tensor(float),"detection_classes": tensor(int32),) , Error No opset import for domain 'trt.plugins'
And if I add the domain to the ModelProto
opset imports w/ opset_imports=[helper.make_opsetid("trt.plugins", 1)]
, I get this instead.
--------------------------------
EfficientNMS_TRT w/ trt.plugins
Traceback (most recent call last):
File "/workspace/ort_inference/python/nms_mwe.py", line 193, in <module>
test_efficient_nms(class_agnostic=True)
File "/workspace/ort_inference/python/nms_mwe.py", line 184, in test_efficient_nms
model = ORTModel(model)
File "/workspace/ort_inference/python/nms_mwe.py", line 147, in __init__
self.session = ort.InferenceSession(
File "/home/prime/miniconda3/envs/ort115/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 383, in __init__
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "/home/prime/miniconda3/envs/ort115/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 426, in _create_inference_session
sess = C.InferenceSession(session_options, self._model_bytes, False, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Fatal error: trt.plugins: EfficientNMS_TRT(-1) is not a registered function/op
Hi @laggui,
Thanks for sharing the test python code.
I found the issue when using ORT python binding to run TRT EP with the new way of supporting TRT plugins. The issue is from ORT python binding, not your python code. Please note that when using ORT C++ binding, there is no such issue. I tested the single-node model from your python code with onnxruntime_perf_test
, and it can run successfully.
The issue of ORT python binding is because:
First, the registration of the custom op (in your case is EfficientNMS_TRT node) happens during TRT EP initialization.
This custom op registration is needed before ORT performs graph validation. From the ORT session initialization using python, you will see it's doing a kind of preliminary initialization (including graph validation) and the real inference session initialization (including TRT EP initialization) won't happen until here. So that's why you were seeing the error message indicating the graph is invalid.
onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. In Node, ("NMS_Op", EfficientNMS_TRT, "trt.plugins", -1)
C++ inference session is slightly different than Python, so it won't have this issue.
I will work on how to fix this Python issue and thanks for raising this.
Hi @chilo-ms, good catch! The final application will be in C++ anyway, only the current prototype is written w/ the python bindings.
I guess I'll move along to the C++ interface until this is fixed :)
Hi @chilo-ms, any chance this will be fixed in the near future?
Just recently implemented a custom TensorRT plugin I'd like to test w/ my prototype app in python but stumbled upon the same error No opset import for domain 'trt.plugins'.
.
@laggui This issue is in our backlog. I'm taking a closer look now and might need to discuss internally. Will give you update once i have some progress.
@laggui I had a draft PR to support Python API for using TRT plugins. Could you help test it? Simply build from source and pip install the wheel should be good. Thank you!
@chilo-ms Absolutely! Kinda busy this week but will get back to you as soon as I test it.
@chilo-ms I confirm, loading a model w/ a supported plugin such as EfficientNMS_TRT from the trt.plugins
domain works with the changes on your branch. Haven't tried with my custom op yet but would assume it would work just as well.
Thanks for comfirming!
Just recently implemented a custom TensorRT plugin I'd like to test w/ my prototype app in python but stumbled upon the same error
No opset import for domain 'trt.plugins'.
.
@laggui what was your solution to fix this?
It's been a while, but in my case I managed to use the branch from the draft PR mentioned in a previous comment. Looks like the PR landed since so not sure which released version you'll need for the changes.
The final app was in C++ anyway so after that I moved away from the python interface.
@laggui, thanks for reply. The fix that worked for you is only for the C++ API right?
Also, just to confirm, you tested this PR with your own custom TRT plugin right?
I got the python bindings to work by building them locally with the draft PR I linked in my last comment. But after that I moved to the C++ API anyway so I didn't look back on this.
Not sure why the PR was merged but the issue is still open (and still seems to be present according to your comment).
@laggui, thanks for reply. The fix that worked for you is only for the C++ API right?
Also, just to confirm, you tested this PR with your own custom TRT plugin right?
@gcunhase, can you provide some more details/repro if you are still facing an issue? @chilo-ms can assist.
@jywu-msft @chilo-ms
I'm getting the following error:
Traceback (most recent call last):
File "/mnt/python/main_ort.py", line 78, in <module>
main()
File "/mnt/python/main_ort.py", line 49, in main
ort_session = ort.InferenceSession(onnx_file_path, sess_options=session_opts, providers=EP)
File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in __init__
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 483, in _create_inference_session
sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for IdentityConv(1) node with name 'Conv-2'
git clone https://github.com/gcunhase/TensorRT-Custom-Plugin-Example.git -b ort_inference
./TensorRT-Custom-Plugin-Example/downloads
../TensorRT-Custom-Plugin-Example/README.md
until "Build ONNX Model". After that, you should have the plugin in ./build/src/libidentity_conv.so
and the ONNX file in ./data/identity_neural_network.onnx
.cd TensorRT-Custom-Plugin-Example/python
python main_ort.py
@chilo-ms ping, thanks.
i can repro and quickly took a look. The libidentity_conv.so can be successfully loaded and TRT EP did register the IdentityConv as a custom op, but for some reasons, the TRT parser seems replying not supporting of this IdentityConv node given the identity_neural_network.onnx. Will investigate more.
Hi @chilo-ms were you able to further investigate this issue by any chance? Thanks for looking into it!
Sorry for the late reply. (I was OOF las week)
i investigated further and found out that the behavior of TRT built-in parser and TRT oss parser are slightly different when parsing the identity_neural_network.onnx.
The built-in parser didn't return IdentityConv
node as TRT eligible even though the libidentity_conv.so
was successfully loaded and the IdentityConv
was recognized by TRT through TRT plugin, whereas oss parser did return IdentityConv
node as TRT eligible
Following two approaches can help unblock you:
Build ORT 1.17.0 with TensorRT OSS parser (see doc here)
remember to add one more build arg: --use_tensorrt_oss_parser
Use ORT 1.18.0 with TRT 10 GA.
I'm reporting this issue to Nvidia and will reply here once i get some updates.
Thank you for your reply!
remember to add one more build arg:
--use_tensorrt_oss_parser
Should I replace --use_tensorrt
for --use_tensorrt_oss_parser
or just add it?
Use ORT 1.18.0 with TRT 10 GA.
So this issue has been fixed in TRT 10 GA?
remember to add one more build arg:
--use_tensorrt_oss_parser
Should I replace
--use_tensorrt
for--use_tensorrt_oss_parser
or just add it?
just add it
Use ORT 1.18.0 with TRT 10 GA.
So this issue has been fixed in TRT 10 GA?
Checking with Nvidia, will get back to you
At lease i tried following combinations and they can work on my side: ORT 1.17.0 + TRT 8.6 + TRT oss parser ORT 1.18.0 + TRT 10 (Note: ORT release package uses TRT built-in parser)
TRT built-in parser and TRT oss parser are slightly different when parsing the identity_neural_network.onnx.
Does this have anything to do with the ONNX custom layer domain being trt.plugins
? Should this domain be set as something else?
TRT built-in parser and TRT oss parser are slightly different when parsing the identity_neural_network.onnx.
Does this have anything to do with the ONNX custom layer domain being
trt.plugins
? Should this domain be set as something else?
No, the usage is still the same which means you don't need to modify the ort_main.py.
@gcunhase Update from Nvidia: There is a restriction in TRT 8.6 built-in parser where the non-registered plugins will be rejected even though ORT automatically registers all of them. Since there not going to be any more TRT 8.6 source releases, the recommendation is to use the OSS parser or upgrade to 10.0, where this restriction is released.
Thank you @chilo-ms! I'll experiment with ORT 1.18 and TRT 10 as suggested.
Describe the issue
With the latest 1.15 release the support for TensorRT 8.6 was added (thanks btw!), but it seems that the EfficientNMS plugin changes did not come with it.
In TensorRT 8.6 the support for class-agnostic NMS was added via a new attribute (see this commit) but it seems that it is missing from the ONNX Runtime operator definition here.
When trying to run a mode exported with the EfficientNMS_TRT node that contains this attribute, we get the following error:
onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. In Node, ("NMS_Op_sg344", EfficientNMS_TRT, "", -1) : ("/_wrapper/postprocess/Slice_19_output_0": tensor(float),"/_wrapper/postprocess/Mul_42_output_0": tensor(float),) -> ("num_boxes": tensor(int32),"/_wrapper/postprocess/CustomNMS_output_1": tensor(float),"/_wrapper/postprocess/CustomNMS_output_2":tensor(float),"/_wrapper/postprocess/CustomNMS_output_3": tensor(int32),) , Error Unrecognized attribute: class_agnostic for operator EfficientNMS_TRT
The same model executes correctly if I use TensorRT 8.6 without ONNX Runtime or if I don't specify the
class_agnostic
attribute when creating the node for the exported model.To reproduce
I can't distribute the model at this time but if required I could take some time to produce a MWE.
Urgency
Project deadline around mid June which would greatly benefit from this change.
Platform
Linux
OS Version
Ubuntu 20.04.2 LTS
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.15
ONNX Runtime API
Python
Architecture
X64
Execution Provider
TensorRT
Execution Provider Library Version
TensorRT 8.6