Inference speed: Swintransformer torch vs onnxruntime-gpu

Describe the issue

I have tested a swintransformer model with torch and onnxruntime-gpu. I found onnxruntime-gpu has no speed advantages over inference based on the torch model.

To reproduce

Get pytorch model: https://huggingface.co/docs/transformers/model_doc/swin
Convert pytorch model to onnx: https://pytorch.org/tutorials/advanced/super_resolution_with_onnxruntime.html
Do inference to check the speed for torch and onnx(onnxruntime-gpu)

Urgency

Yes

Platform

Linux

OS Version

18.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.13.1

ONNX Runtime API

Python

Architecture

Other / Unknown

Execution Provider

CUDA

Execution Provider Library Version

No response

microsoft / onnxruntime

Inference speed: Swintransformer torch vs onnxruntime-gpu #13550