I have tested a swintransformer model with torch and onnxruntime-gpu.
I found onnxruntime-gpu has no speed advantages over inference based on the torch model.
I have some problems with matmul node and some problems with cublas fault 14, when i use onnxruntime with cuda to infer swin-t.onnx, but if i use cpu with onnxruntime, no error , do you have any idea
Describe the issue
I have tested a swintransformer model with torch and onnxruntime-gpu. I found onnxruntime-gpu has no speed advantages over inference based on the torch model.
To reproduce
Urgency
Yes
Platform
Linux
OS Version
18.04
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.13.1
ONNX Runtime API
Python
Architecture
Other / Unknown
Execution Provider
CUDA
Execution Provider Library Version
No response