Open tom-huntington opened 1 year ago
Thanks for posting. We're currently looking into this.
Thanks for posting. We're currently looking into this.
The default pytorch tracing doesn't optimize the shape calls at all and this is what's slowing down DML (I think). All shape calls are pure function of the input shapes, hmmm. In the future, I may try to optimize the resulting onnx graph by refactoring the pytorch code.
Feel free to give some guidance on solving this problem
Interesting, this might be worth investigating and keeping an eye on for us too.
Describe the issue
This is the decoder model for openai-whisper (whisper requires dynamic axes). Inference times
c++
Microsoft::AI::MachineLearning
v1.15.1: ~700 ms dynamic. 75 ms all dimensions fixed. c++Windows::AI::MachineLearning
v1.8.1: ~130 ms dynamic. 60 ms all dimensions fixed. pythononnxruntime-directml
v1.15.1: ~ 260 ms dynamic.Although DML dynamic axes never performed properly in the first place https://github.com/microsoft/onnxruntime/issues/14550
LearningModelSession::Evaluate
throws hundreds of exceptions when usingMicrosoft::AI::MachineLearning
Platform
Windows
OS Version
22H2
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.15.1
ONNX Runtime API
C++
Architecture
X64
Execution Provider
DirectML
Execution Provider Library Version
No response
Model File
https://huggingface.co/tom-huntington/whisper/tree/main
Is this a quantized model?
No