Open jcdatin opened 7 months ago
how can we make progress here ?
I tried on Turing and ADA GPUs , I tried with different version of Cudnn , 1.17.1 inference on the faster-rcnn is always slower by 30% at least. Note that TRT used is same between the 2 ONNXRT versions used I guess I need to build a demonstrator I can share, but any guidance about determining which ONNX instruction is talking longer in 1.17.1 with my production model ?
did not succeed in executing inference of torchvision fasterrcnn_resnet50_fpn opensource model with onnxruntime TensorRT ExecutionProvider (I used torchvision https://github.com/pytorch/vision/blob/main/test/test_onnx.py TestONNXExporter::test_faster_rcnn)
succeeded though in running with onnxruntime CUDAExecution provider but not with TensorRT Execution provider .
Torchvision test wants to fallback to CUDA Execution Provider because fasterRCNN exported model missed the inferred dimension that TensorRT requires.
Unfortunately onnxruntime/tools/symbolic_shape_infer.py tool also crashes on converting exported onnx fasterRCNN model for tensorRT (I needed to run this tool too on my moel to run it on TensorRT Execution Provider)
I am stuck in providing a non IP demonstration code of faster_rcnn for onnxrtuntiome TensorRT Execution Provider and hence help define which onnx operator regresses in Onnxrt 1.17.1 versus 1.16.3
Need instructions for determining myself the onnx operator that regresses on my model and indicate to onnxrt team: any help ?
Describe the issue
observed that performance has decreased at least 25% with TRT Ep and 18% with Cuda EP on my Faster-Rcnn model inference. From 2860ms to 3785ms and from 3015ms to 3680ms. (Interestingly with ORT 1.17.1 Cuda Ep becomes faster than TRt EP !).
To reproduce
my model is private but https://pytorch.org/vision/main/models/faster_rcnn.html hould do since my model is also based on trochvision
Urgency
medium : need to fix within 3 months
Platform
Linux
OS Version
SLES15 SP4 and SLES15 SP5
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.17.1
ONNX Runtime API
C++
Architecture
X64
Execution Provider
CUDA, TensorRT
Execution Provider Library Version
CUDA 12.2 and TRT 8.6.1.6
Model File
can't share note that I am using an RTX8000 and inference is done with nvidia TF32 with both TRT and Cuda EP (can't use FP16) ONNRT and its cuda modeules were compiled with gcc11
Is this a quantized model?
No