Fatal error: TRT:EfficientNMS_TRT(-1) is not a registered function/op

levipereira commented 1 year ago

I need to run the Triton Server using an ONNX model that generates a TensorRT engine on-the-fly. I'm aware that I could use the trtexec utility to generate the TensorRT engine, but I have multiple types of GPUs and would need to run the trtexec on different hosts. Using the ONNX Runtime to generate the TensorRT engine on-the-fly is what I need. I have a ONNX model with grid, EfficientNMS plugin and dynamic batch size.

Using trtexec to build a model works fine.

./tensorrt/bin/trtexec --onnx=yolov7.onnx --minShapes=images:1x3x640x640 --optShapes=images:8x3x640x640 --maxShapes=images:8x3x640x640 --fp16 --workspace=4096 --saveEngine=yolov7-fp16-1x8x8.engine --timingCacheFile=timing.cache

Issue description


I0504 16:36:16.981021 1 server.cc:610]
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend     | Path                                                            | Config                                                                                                                                                        |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0504 16:36:16.981056 1 server.cc:653]
+-----------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model           | Version | Status                                                                                                                                                                         |
+-----------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| yolov7| 1       | UNAVAILABLE: Internal: onnx runtime error 1: Load model from /models/yolov7/1/model.onnx failed:Fatal error: TRT:EfficientNMS_TRT(-1) is not a registered function/op |
+-----------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I'm encountering a fatal error when running my YOLOv7 model with TensorRT optimization. Specifically, the error message states that "TRT:EfficientNMS_TRT(-1) is not a registered function/op".

Steps to reproduce Run the model with the following configuration:

name: "yolov7"
platform: "onnxruntime_onnx"
max_batch_size: 10
input [
  {
    name: "images"
    data_type: TYPE_FP32
    dims: [1, 3, 640, 640]
  }
]
output [  
  {   
    name: "num_dets"   
    data_type: TYPE_INT32   
    dims: [1, 1]
  },
  {
    name: "det_boxes"
    data_type: TYPE_FP32
    dims: [1, 300, 4]
  },
  {
    name: "det_scores"
    data_type: TYPE_FP32
    dims: [1, 300]
  },
  {
    name: "det_classes"
    data_type: TYPE_INT32
    dims: [1, 300]
  }
]
optimization { execution_accelerators {
  gpu_execution_accelerator : [ {
    name : "tensorrt"
    parameters { key: "precision_mode" value: "FP16" }
    parameters { key: "max_workspace_size_bytes" value: "4073741824" }}
  ]
}}
dynamic_batching {
  max_queue_delay_microseconds: 100
}

Docker Run:

docker run --gpus all --rm --name triton_server --ipc=host  -p8000:8000 -p8001:8001 -p8002:8002 -v /storage/triton-server/devel/triton-server_23.04/models:/models nvcr.io/nvidia/tritonserver:23.04-py3 tritonserver --model-repository=/models --log-verbose=1

miknyko commented 1 year ago

same issue here

Kuchchi12 commented 5 months ago

did you find any solution to it yet ?

triton-inference-server / onnxruntime_backend

Fatal error: TRT:EfficientNMS_TRT(-1) is not a registered function/op #185