Closed MiroPsota closed 19 hours ago
Hi @MiroPsota Thanks for bringing up this issue! Does the model with NMS op could run on previous version of ONNXRuntime+TRT ? Also could you share the standard model (without nms) that you tested on CPU/GPU/OpenVINO EP?
Zip with all the models and updated run.py
.
1.18.0 and 1.18.1 - the mentioned problem occurs.
I used TensorRT 8.6.1.6 from here for 1.17.1 tests (ORT gpu pypi package). OOM doesn't occur, but another error occurs (op not implemented) and it adds not wanted copy operations to a host and back. See the log. I will investigate further.
The ONNX model for TensorRT can be run without problems with mmdeploy, which uses TensorRT directly. One difference is that TRTBatchedNMS
had originally a different domain, which I changed to trt.plugins
according to docs, so it can be run in ORT.
Describe the issue
OOM (RAM) when loading the model - 50GiB is not enough.
To reproduce
The model is RTMDet from here. ONNX exported model has TensorRT specific NonMaximumSuppression node from here (probably slightly changed).
mmdetection model without the specific NMS op for TensorRT can be run without problems in CPU EP, CUDA EP and OpenVINO EP (OV tested with 1.17.0 Pypi package).
CUDA from https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run CUDNN from https://developer.nvidia.com/downloads/compute/cudnn/secure/8.9.2/local_installers/11.x/cudnn-linux-x86_64-8.9.2.26_cuda11-archive.tar.xz/ TensorRT from https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.0.1/tars/TensorRT-10.0.1.6.Linux.x86_64-gnu.cuda-11.8.tar.gz
Code and model are included in the zip file, with compiled custom op (Ubuntu 24.04, GCC 11.4). Tested with Python 3.10. Runnable from
run.py
withLD_LIBRARY_PATH
set to correct paths (can useld_library.py
script as a help).If needed, I could try to make a docker image that reproduces the bug.
Urgency
No response
Platform
Linux
OS Version
Ubuntu 24.04
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.18.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
TensorRT
Execution Provider Library Version
CUDA 11.8.0, CUDNN 8.9.2.26, TensorRT 10.0.1.6