I need to run the Triton Server using an ONNX model that generates a TensorRT engine on-the-fly. I'm aware that I could use the trtexec utility to generate the TensorRT engine, but I have multiple types of GPUs and would need to run the trtexec on different hosts. Using the ONNX Runtime to generate the TensorRT engine on-the-fly is what I need.
I have a ONNX model with grid, EfficientNMS plugin and dynamic batch size.
I0504 16:36:16.981021 1 server.cc:610]
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0504 16:36:16.981056 1 server.cc:653]
+-----------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+-----------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| yolov7| 1 | UNAVAILABLE: Internal: onnx runtime error 1: Load model from /models/yolov7/1/model.onnx failed:Fatal error: TRT:EfficientNMS_TRT(-1) is not a registered function/op |
+-----------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I'm encountering a fatal error when running my YOLOv7 model with TensorRT optimization. Specifically, the error message states that "TRT:EfficientNMS_TRT(-1) is not a registered function/op".
Steps to reproduce
Run the model with the following configuration:
I need to run the Triton Server using an ONNX model that generates a TensorRT engine on-the-fly. I'm aware that I could use the trtexec utility to generate the TensorRT engine, but I have multiple types of GPUs and would need to run the trtexec on different hosts. Using the ONNX Runtime to generate the TensorRT engine on-the-fly is what I need. I have a ONNX model with grid, EfficientNMS plugin and dynamic batch size.
Using trtexec to build a model works fine.
./tensorrt/bin/trtexec --onnx=yolov7.onnx --minShapes=images:1x3x640x640 --optShapes=images:8x3x640x640 --maxShapes=images:8x3x640x640 --fp16 --workspace=4096 --saveEngine=yolov7-fp16-1x8x8.engine --timingCacheFile=timing.cache
Issue description
I'm encountering a fatal error when running my YOLOv7 model with TensorRT optimization. Specifically, the error message states that "TRT:EfficientNMS_TRT(-1) is not a registered function/op".
Steps to reproduce Run the model with the following configuration:
Docker Run: