Open YuriGao opened 2 months ago
For running fast on Cuda EP, i have to use RoiAlign (Opset 10 version) and insert Sub Op before RoiAlign's rois input. Should notice that the Sub value is corresponding with RoiAlign's spatial_scale attrs. The Sub value should be 0.5 / RoiAlign["spatial_scale "]. It will be good for everyone if someone could upgrade the current RoiAlign Cuda EP implement.
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
Describe the issue
i'm using cascade mask rcnn model in detectron2. when export onnx, it has RoiAlign (opset 16 version) in model file. when running on onnxruntime (Cuda EP), it's too slow since RoiAlign running on CPU EP. Could anyone provider RoiAlign (opset 16 version) on Cuda EP?
To reproduce
1、Exporting Cascade Mask RCNN in detectron2; 2、Running model in Onnxruntime Cuda EP;
Urgency
No response
Platform
Windows
OS Version
Win10
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.18.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
CUDA 11.8 and CUDA 12.2
Model File
No response
Is this a quantized model?
No