microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
13.48k stars 2.76k forks source link

EfficientNMS_TRT missing attribute class_agnostic w/ TensorRT 8.6 #16121

Open laggui opened 1 year ago

laggui commented 1 year ago

Describe the issue

With the latest 1.15 release the support for TensorRT 8.6 was added (thanks btw!), but it seems that the EfficientNMS plugin changes did not come with it.

In TensorRT 8.6 the support for class-agnostic NMS was added via a new attribute (see this commit) but it seems that it is missing from the ONNX Runtime operator definition here.

When trying to run a mode exported with the EfficientNMS_TRT node that contains this attribute, we get the following error: onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. In Node, ("NMS_Op_sg344", EfficientNMS_TRT, "", -1) : ("/_wrapper/postprocess/Slice_19_output_0": tensor(float),"/_wrapper/postprocess/Mul_42_output_0": tensor(float),) -> ("num_boxes": tensor(int32),"/_wrapper/postprocess/CustomNMS_output_1": tensor(float),"/_wrapper/postprocess/CustomNMS_output_2":tensor(float),"/_wrapper/postprocess/CustomNMS_output_3": tensor(int32),) , Error Unrecognized attribute: class_agnostic for operator EfficientNMS_TRT

The same model executes correctly if I use TensorRT 8.6 without ONNX Runtime or if I don't specify the class_agnostic attribute when creating the node for the exported model.

To reproduce

I can't distribute the model at this time but if required I could take some time to produce a MWE.

Urgency

Project deadline around mid June which would greatly benefit from this change.

Platform

Linux

OS Version

Ubuntu 20.04.2 LTS

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.15

ONNX Runtime API

Python

Architecture

X64

Execution Provider

TensorRT

Execution Provider Library Version

TensorRT 8.6

jywu-msft commented 1 year ago

we have a more generic way of supporting TRT plugins which does not require contrib op schema updates. see https://github.com/microsoft/onnxruntime/pull/13847 +@chilo-ms to provide more details.

laggui commented 1 year ago

Sweet, that went under my radar! Will wait for further details.

chilo-ms commented 1 year ago

Hi laggui,

As George mentioned, please use the new generic way of supporting TRT plugins. All you need to do is change the EfficientNMS_TRT node in the onnx model with the domain trt.plugins (you can use python to easy modify the onnx node) instead of the default onnx domain which is null string. The old way of using TRT plugins is through contrib op and it checks domain kOnnxDomain which is null string and as you already found out it doesn't add the new attribute that's why ORT failed to run the model.

Thanks for pointing out this issue and we can think of better ways to let people use the newer way of supporting TRT plugin. Let me know if you encounter any issues.

laggui commented 1 year ago

That seems pretty straightforward, thanks!

Just tried this very quickly and I am getting an error though. When I simply change the domain to "trt.plugins" as below, I get Error No opset import for domain 'trt.plugins'.

Full error message onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. In Node, ("NMS_Op_sg344", EfficientNMS_TRT, "trt.plugins", -1) : ("/_wrapper/postprocess/Slice_19_output_0": tensor(float),"/_wrapper/postprocess/Mul_42_output_0": tensor(float),) -> ("num_boxes": tensor(int32),"/_wrapper/postprocess/CustomNMS_output_1": tensor(float),"/_wrapper/postprocess/CustomNMS_output_2": tensor(float),"/_wrapper/postprocess/CustomNMS_output_3": tensor(int32),) , Error No opset import for domain 'trt.plugins'
# Make node attributes
score_threshold = helper.make_attribute("score_threshold", score_thresh)
iou_threshold = helper.make_attribute("iou_threshold", iou_thresh)
max_output_boxes = helper.make_attribute("max_output_boxes", detections_per_img)
background_class = helper.make_attribute("background_class", background_class)
score_activation = helper.make_attribute("score_activation", 0)  # False
box_coding = helper.make_attribute("box_coding", 0)
plugin_version = helper.make_attribute("plugin_version", "1")
class_agnostic = helper.make_attribute("class_agnostic", class_agnostic)

# Create NMS node
nms_node = helper.make_node(
    "EfficientNMS_TRT",
    inputs=inputs,
    outputs=outputs,
    name="NMS_Op",
    domain="trt.plugins", # new TRT plugins
)
nms_node.attribute.extend(
    [
        background_class,
        box_coding,
        iou_threshold,
        max_output_boxes,
        plugin_version,
        score_activation,
        score_threshold,
        class_agnostic,
    ]
)

I thought maybe it's because I forgot to export de ModelProto with the trt.plugins opset even if it wasn't mentioned anywhere

opsets = [helper.make_opsetid("", self.opset_version), helper.make_opsetid("trt.plugins", 1)]
helper.make_model(
    graph,
    opset_imports=opsets,
)

but that didn't seem to help. Instead I got the following error [ONNXRuntimeError] : 1 : FAIL : Fatal error: trt.plugins: EfficientNMS_TRT(-1) is not a registered function/op

Gonna double check my node definition to make sure I didn't miss anything and will report back later.

laggui commented 1 year ago

@chilo-ms I've managed to narrow it down to a MWE for a graph with a single node. It works with the old way of using EfficientNMS_TRT via contrib ops, but not with the latest trt.plugins.

I double checked that the EfficientNMS definition hasn't changed for TensorRT 8.6. Everything seems to be in accordance w.r.t the node definition (inputs, outputs and attributes).

Code

from typing import Sequence

import numpy as np
import onnxruntime as ort
from onnx import ModelProto, NodeProto, TensorProto, helper

def make_nms_node_trt(
    inputs: Sequence[str],
    outputs: Sequence[str],
    score_thresh: float = 0.5,
    iou_thresh: float = 0.45,
    detections_per_img: int = 200,
    background_class: int = -1,
    class_agnostic: bool = False,
) -> NodeProto:
    """
    Create EfficientNMS_TRT plugin node.

    Inputs:
      - boxes (float): [batch_size, num_boxes, 4]
      - scores (float): [batch_size, num_boxes, num_classes]

    Outputs:
      - num_detections (int32): [batch_size, 1]
      - detection_boxes (float): [batch_size, max_output_boxes, 4]
      - detection_scores (float): [batch_size, max_output_boxes]
      - detection_classes (int32): [batch_size, max_output_boxes]

    """
    # Ref: https://github.com/NVIDIA/TensorRT/tree/main/plugin/efficientNMSPlugin
    assert len(inputs) == 2, "EfficientNMS_TRT expects two inputs"
    assert len(outputs) == 4, "EfficientNMS_TRT expects four outputs"

    # Make node attributes
    score_threshold = helper.make_attribute("score_threshold", score_thresh)
    iou_threshold = helper.make_attribute("iou_threshold", iou_thresh)
    max_output_boxes = helper.make_attribute("max_output_boxes", detections_per_img)
    background_class = helper.make_attribute("background_class", background_class)
    score_activation = helper.make_attribute("score_activation", 0)  # False
    box_coding = helper.make_attribute("box_coding", 0)
    plugin_version = helper.make_attribute("plugin_version", "1")

    # Create NMS node
    nms_node = helper.make_node(
        "EfficientNMS_TRT",
        inputs=inputs,
        outputs=outputs,
        name="NMS_Op",
        domain="trt.plugins" if class_agnostic else "",  # new trt.plugins domain
    )
    nms_node.attribute.extend(
        [
            background_class,
            box_coding,
            iou_threshold,
            max_output_boxes,
            plugin_version,
            score_activation,
            score_threshold,
        ]
    )

    # Only add class_agnostic attribute w/ "trt.plugins" domain
    if class_agnostic:
        class_agnostic = helper.make_attribute("class_agnostic", class_agnostic)
        nms_node.attribute.extend([class_agnostic])

    return nms_node

def make_nms_model(
    batch_size: int = 1,
    num_boxes: int = 4,
    num_classes: int = 2,
    max_output_boxes: int = 200,
    class_agnostic: bool = False,
) -> ModelProto:
    """Create ONNX model with EfficientNMS_TRT plugin node."""
    # Inputs
    boxes = helper.make_tensor_value_info("boxes", TensorProto.FLOAT, [1, num_boxes, 4])
    scores = helper.make_tensor_value_info(
        "scores", TensorProto.FLOAT, [batch_size, num_boxes, num_classes]
    )

    # Outputs
    num_detections = helper.make_tensor_value_info(
        "num_detections", TensorProto.INT32, [batch_size, 1]
    )
    detection_boxes = helper.make_tensor_value_info(
        "detection_boxes", TensorProto.FLOAT, [batch_size, max_output_boxes, 4]
    )
    detection_scores = helper.make_tensor_value_info(
        "detection_scores", TensorProto.FLOAT, [batch_size, max_output_boxes]
    )
    detection_classes = helper.make_tensor_value_info(
        "detection_classes", TensorProto.INT32, [batch_size, max_output_boxes]
    )

    # ONNX graph
    node = make_nms_node_trt(
        [boxes.name, scores.name],
        [
            num_detections.name,
            detection_boxes.name,
            detection_scores.name,
            detection_classes.name,
        ],
        score_thresh=0.1,
        iou_thresh=0.5,
        detections_per_img=max_output_boxes,
        class_agnostic=class_agnostic,
    )

    graph = helper.make_graph(
        [node],
        "NMS_Graph_TRT",
        [boxes, scores],
        [num_detections, detection_boxes, detection_scores, detection_classes],
    )

    # NOTE: To register "trt.plugins" opset domain or not?
    m = helper.make_model(graph)
    # m = helper.make_model(
    #     graph, opset_imports=[helper.make_opsetid("trt.plugins", 1)]
    # )
    return m

class ORTModel:
    def __init__(
        self,
        model: ModelProto,
        providers: Sequence[str] = (
            "TensorrtExecutionProvider",
            "CUDAExecutionProvider",
        ),
    ) -> None:
        # Initialize ORT session
        self.model = model
        self.outputs = [
            "num_detections",
            "detection_boxes",
            "detection_scores",
            "detection_classes",
        ]
        self.session = ort.InferenceSession(
            self.model.SerializeToString(), providers=providers
        )

    def __call__(self, boxes: np.ndarray, scores: np.ndarray) -> Sequence[np.ndarray]:
        ndet, nms_boxes, nms_scores, nms_classes = self.session.run(
            self.outputs,
            {"boxes": boxes, "scores": scores},
        )

        # NOTE: ndet = 2 if not class_agnostic else 1
        print(f"Num. boxes: {ndet.item()}")

def test_efficient_nms(class_agnostic: bool = False) -> None:
    # Boxes [1, 4, 4]
    boxes = np.array(
        [[[1, 3, 3, 0.95], [1, 3, 4, 0.93], [0.9, 3.6, 3, 0.98], [0.9, 3.5, 3, 0.97]]],
        dtype=np.float32,
    )
    # Scores [1, 4, 2]
    scores = np.array(
        [
            [
                [0.80, 0.20],
                [0.70, 0.30],
                [0.40, 0.60],
                [0.75, 0.25],
            ]
        ],
        dtype=np.float32,
    )

    print("-" * 32)
    op_domain = "trt.plugins" if class_agnostic else "contrib_ops"
    print(f"EfficientNMS_TRT w/ {op_domain}")
    model = make_nms_model(class_agnostic=class_agnostic)
    model = ORTModel(model)
    model(boxes, scores)
    print("[OK] Inference test passed")

if __name__ == "__main__":
    # Mute anything below error level (3)
    ort.set_default_logger_severity(3)
    # NOTE: uncomment `class_agnostic=False` to run the model w/ old contrib ops
    # test_efficient_nms(class_agnostic=False)
    test_efficient_nms(class_agnostic=True)

Console output w/ error

Old contrib ops model runs correctly

--------------------------------
EfficientNMS_TRT w/ contrib_ops
Num. boxes: 2
[OK] Inference test passed

New trt.plugins model crashes

--------------------------------
EfficientNMS_TRT w/ trt.plugins
Traceback (most recent call last):
  File "/workspace/ort_inference/python/nms_mwe.py", line 193, in <module>
    test_efficient_nms(class_agnostic=True)
  File "/workspace/ort_inference/python/nms_mwe.py", line 184, in test_efficient_nms
    model = ORTModel(model)
  File "/workspace/ort_inference/python/nms_mwe.py", line 147, in __init__
    self.session = ort.InferenceSession(
  File "/home/prime/miniconda3/envs/ort115/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 383, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/home/prime/miniconda3/envs/ort115/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 426, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_bytes, False, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. In Node, ("NMS_Op", EfficientNMS_TRT, "trt.plugins", -1) : ("boxes": tensor(float),"scores": tensor(float),) -> ("num_detections": tensor(int32),"detection_boxes": tensor(float),"detection_scores": tensor(float),"detection_classes": tensor(int32),) , Error No opset import for domain 'trt.plugins'

And if I add the domain to the ModelProto opset imports w/ opset_imports=[helper.make_opsetid("trt.plugins", 1)], I get this instead.

--------------------------------
EfficientNMS_TRT w/ trt.plugins
Traceback (most recent call last):
  File "/workspace/ort_inference/python/nms_mwe.py", line 193, in <module>
    test_efficient_nms(class_agnostic=True)
  File "/workspace/ort_inference/python/nms_mwe.py", line 184, in test_efficient_nms
    model = ORTModel(model)
  File "/workspace/ort_inference/python/nms_mwe.py", line 147, in __init__
    self.session = ort.InferenceSession(
  File "/home/prime/miniconda3/envs/ort115/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 383, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/home/prime/miniconda3/envs/ort115/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 426, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_bytes, False, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Fatal error: trt.plugins: EfficientNMS_TRT(-1) is not a registered function/op
chilo-ms commented 1 year ago

Hi @laggui,

Thanks for sharing the test python code. I found the issue when using ORT python binding to run TRT EP with the new way of supporting TRT plugins. The issue is from ORT python binding, not your python code. Please note that when using ORT C++ binding, there is no such issue. I tested the single-node model from your python code with onnxruntime_perf_test, and it can run successfully.

The issue of ORT python binding is because: First, the registration of the custom op (in your case is EfficientNMS_TRT node) happens during TRT EP initialization. This custom op registration is needed before ORT performs graph validation. From the ORT session initialization using python, you will see it's doing a kind of preliminary initialization (including graph validation) and the real inference session initialization (including TRT EP initialization) won't happen until here. So that's why you were seeing the error message indicating the graph is invalid. onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. In Node, ("NMS_Op", EfficientNMS_TRT, "trt.plugins", -1)

C++ inference session is slightly different than Python, so it won't have this issue.

I will work on how to fix this Python issue and thanks for raising this.

laggui commented 1 year ago

Hi @chilo-ms, good catch! The final application will be in C++ anyway, only the current prototype is written w/ the python bindings.

I guess I'll move along to the C++ interface until this is fixed :)

laggui commented 9 months ago

Hi @chilo-ms, any chance this will be fixed in the near future?

Just recently implemented a custom TensorRT plugin I'd like to test w/ my prototype app in python but stumbled upon the same error No opset import for domain 'trt.plugins'..

chilo-ms commented 9 months ago

@laggui This issue is in our backlog. I'm taking a closer look now and might need to discuss internally. Will give you update once i have some progress.

chilo-ms commented 9 months ago

@laggui I had a draft PR to support Python API for using TRT plugins. Could you help test it? Simply build from source and pip install the wheel should be good. Thank you!

laggui commented 9 months ago

@chilo-ms Absolutely! Kinda busy this week but will get back to you as soon as I test it.

laggui commented 9 months ago

@chilo-ms I confirm, loading a model w/ a supported plugin such as EfficientNMS_TRT from the trt.plugins domain works with the changes on your branch. Haven't tried with my custom op yet but would assume it would work just as well.

chilo-ms commented 9 months ago

Thanks for comfirming!

gcunhase commented 1 month ago

Just recently implemented a custom TensorRT plugin I'd like to test w/ my prototype app in python but stumbled upon the same error No opset import for domain 'trt.plugins'..

@laggui what was your solution to fix this?

laggui commented 1 month ago

It's been a while, but in my case I managed to use the branch from the draft PR mentioned in a previous comment. Looks like the PR landed since so not sure which released version you'll need for the changes.

The final app was in C++ anyway so after that I moved away from the python interface.

gcunhase commented 1 month ago

@laggui, thanks for reply. The fix that worked for you is only for the C++ API right?

Also, just to confirm, you tested this PR with your own custom TRT plugin right?

laggui commented 1 month ago

I got the python bindings to work by building them locally with the draft PR I linked in my last comment. But after that I moved to the C++ API anyway so I didn't look back on this.

Not sure why the PR was merged but the issue is still open (and still seems to be present according to your comment).

jywu-msft commented 1 month ago

@laggui, thanks for reply. The fix that worked for you is only for the C++ API right?

Also, just to confirm, you tested this PR with your own custom TRT plugin right?

@gcunhase, can you provide some more details/repro if you are still facing an issue? @chilo-ms can assist.

gcunhase commented 1 month ago

@jywu-msft @chilo-ms

I'm getting the following error:

Traceback (most recent call last):
  File "/mnt/python/main_ort.py", line 78, in <module>
    main()
  File "/mnt/python/main_ort.py", line 49, in main
    ort_session = ort.InferenceSession(onnx_file_path, sess_options=session_opts, providers=EP)
  File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 483, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for IdentityConv(1) node with name 'Conv-2'

Steps to reproduce

  1. Download the TensorRT 8.6-GA tar from the TensorRT website.
  2. Clone repo:
    git clone https://github.com/gcunhase/TensorRT-Custom-Plugin-Example.git -b ort_inference
  3. Copy the TensorRT tar downloaded in the first step into ./TensorRT-Custom-Plugin-Example/downloads.
  4. Follow steps in ./TensorRT-Custom-Plugin-Example/README.md until "Build ONNX Model". After that, you should have the plugin in ./build/src/libidentity_conv.so and the ONNX file in ./data/identity_neural_network.onnx.
  5. Run ORT inference:
    cd TensorRT-Custom-Plugin-Example/python
    python main_ort.py

System info

Following requirements here and here.

gcunhase commented 1 month ago

@chilo-ms ping, thanks.

chilo-ms commented 1 month ago

i can repro and quickly took a look. The libidentity_conv.so can be successfully loaded and TRT EP did register the IdentityConv as a custom op, but for some reasons, the TRT parser seems replying not supporting of this IdentityConv node given the identity_neural_network.onnx. Will investigate more.

gcunhase commented 1 month ago

Hi @chilo-ms were you able to further investigate this issue by any chance? Thanks for looking into it!

chilo-ms commented 1 month ago

Sorry for the late reply. (I was OOF las week)

i investigated further and found out that the behavior of TRT built-in parser and TRT oss parser are slightly different when parsing the identity_neural_network.onnx. The built-in parser didn't return IdentityConv node as TRT eligible even though the libidentity_conv.so was successfully loaded and the IdentityConv was recognized by TRT through TRT plugin, whereas oss parser did return IdentityConv node as TRT eligible

Following two approaches can help unblock you:

I'm reporting this issue to Nvidia and will reply here once i get some updates.

gcunhase commented 1 month ago

Thank you for your reply!

remember to add one more build arg: --use_tensorrt_oss_parser

Should I replace --use_tensorrt for --use_tensorrt_oss_parser or just add it?

Use ORT 1.18.0 with TRT 10 GA.

So this issue has been fixed in TRT 10 GA?

chilo-ms commented 1 month ago

remember to add one more build arg: --use_tensorrt_oss_parser

Should I replace --use_tensorrt for --use_tensorrt_oss_parser or just add it?

just add it

Use ORT 1.18.0 with TRT 10 GA.

So this issue has been fixed in TRT 10 GA?

Checking with Nvidia, will get back to you

At lease i tried following combinations and they can work on my side: ORT 1.17.0 + TRT 8.6 + TRT oss parser ORT 1.18.0 + TRT 10 (Note: ORT release package uses TRT built-in parser)

gcunhase commented 1 month ago

TRT built-in parser and TRT oss parser are slightly different when parsing the identity_neural_network.onnx.

Does this have anything to do with the ONNX custom layer domain being trt.plugins? Should this domain be set as something else?

chilo-ms commented 1 month ago

TRT built-in parser and TRT oss parser are slightly different when parsing the identity_neural_network.onnx.

Does this have anything to do with the ONNX custom layer domain being trt.plugins? Should this domain be set as something else?

No, the usage is still the same which means you don't need to modify the ort_main.py.

chilo-ms commented 1 month ago

@gcunhase Update from Nvidia: There is a restriction in TRT 8.6 built-in parser where the non-registered plugins will be rejected even though ORT automatically registers all of them. Since there not going to be any more TRT 8.6 source releases, the recommendation is to use the OSS parser or upgrade to 10.0, where this restriction is released.

gcunhase commented 1 month ago

Thank you @chilo-ms! I'll experiment with ORT 1.18 and TRT 10 as suggested.