microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.12k stars 2.84k forks source link

[Performance] DML dynamic axes performance regression. #16424

Open tom-huntington opened 1 year ago

tom-huntington commented 1 year ago

Describe the issue

This is the decoder model for openai-whisper (whisper requires dynamic axes). Inference times

c++ Microsoft::AI::MachineLearning v1.15.1: ~700 ms dynamic. 75 ms all dimensions fixed. c++ Windows::AI::MachineLearning v1.8.1: ~130 ms dynamic. 60 ms all dimensions fixed. python onnxruntime-directml v1.15.1: ~ 260 ms dynamic.

Although DML dynamic axes never performed properly in the first place https://github.com/microsoft/onnxruntime/issues/14550

LearningModelSession::Evaluate throws hundreds of exceptions when using Microsoft::AI::MachineLearning

Exception thrown at 0x00007FFF0C022BAC in app.exe: Microsoft C++ exception: _com_error at memory location 0x000000ECEC930428.

Platform

Windows

OS Version

22H2

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.15.1

ONNX Runtime API

C++

Architecture

X64

Execution Provider

DirectML

Execution Provider Library Version

No response

Model File

https://huggingface.co/tom-huntington/whisper/tree/main

Is this a quantized model?

No

nums11 commented 1 year ago

Thanks for posting. We're currently looking into this.

tom-huntington commented 1 year ago

Thanks for posting. We're currently looking into this.

The default pytorch tracing doesn't optimize the shape calls at all and this is what's slowing down DML (I think). All shape calls are pure function of the input shapes, hmmm. In the future, I may try to optimize the resulting onnx graph by refactoring the pytorch code.

Feel free to give some guidance on solving this problem

andrea-cimatoribus-pix4d commented 3 months ago

Interesting, this might be worth investigating and keeping an eye on for us too.