Open tikr7 opened 7 months ago
This PR should solve it: https://github.com/microsoft/onnxruntime/pull/19540.
In the meantime we converted ONNX into then into TensorRT which is even 2x faster than the pure pytorch.
I still seeing this warning "CUDA kernel not found in registries for Op type: GridSample" and experiencing severe performance degradation. Further information: pytorch 2.4.0.dev20240515+cu121 onnx 1.16.0 onnxruntime-gpu 1.18.0 cuda 12.2 (also tried 11.8) python 3.8 opset 20 I also notice gridsample cuda kernel has already been implemented and is waiting to be merged( see #18958 ). May I ask when will this PR be merged?@xadupre
Is there any estimation on the timeline? @xadupre
Describe the issue
We exported the Huggingface transformer model OneFormer into onnx.
Opset 20 failed with the error:
With Opset 19 we were able to export to onnx but the onnxruntime puts the operators ScatterND / GridSample on CPU instead of GPU / CUDA. This drops the performance by factor of at least 4.
The first screenshot shows with Nvidia Nsight the pure pytorch model in python:
The second screenshot shows the same with onnxruntime in python:
With pytorch the gpu utilization looks very good and fast, while the onnxruntime uses cpu a lot, needs to switch a lot between vram and ram which drops gpu utilization and the model inference speed.
Relevant logs from the onnxruntime:
According to the documentation ScatterND / GridSample operators should supported on cuda since Opset 18+.
Further information
pytorch 2.2.2 onnx 1.16.0 onnxruntime-gpu 1.17.1 cuda 11.8 (also tried 12.3) python 3.9 opset 19
To reproduce
If you need more details how to reproduce, we can provide the model and everything.
Urgency
No response
Platform
Linux
OS Version
Ubuntu 22.04
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
onnxruntime-gpu 1.17.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
cuda 11.8 (also tried 12.3)
Model File
179 MB zipped is too big got GitHub
Is this a quantized model?
Unknown