microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.05k stars 2.83k forks source link

DirectML EP gives wrong results on both Integrated and Discrete GPUs #19837

Open AdarshAcharya5 opened 6 months ago

AdarshAcharya5 commented 6 months ago

I'm trying to run inference on a model in C++, but as it turns out, I get completely wrong results when I run it on DirectML EP, but running it on CPU works just fine. Sample outputs:

CPU (Correct):

7.9641705378890038e-03 5.1060058176517487e-03 1.6217185184359550e-03 3.2245237380266190e-03 1.9704271107912064e-03 -2.6769749820232391e-03 -3.4305416047573090e-03 -5.9413574635982513e-03 -8.1969611346721649e-03 -5.6667793542146683e-03 -4.0454901754856110e-03 -4.5984257012605667e-03 -2.4918280541896820e-03 -5.0726905465126038e-04 -1.3493187725543976e-03

GPU | DirectML EP(Incorrect):

7.63121 2.10706 -15.587 -8.67914 -20.8112 -37.6199 -15.6217 -13.4909 45.5337 82.2263 95.6195 45.5667 124.225 188.378 167.306 142.281

Init Code for reference

        mEnv = new Ort::Env(ORT_LOGGING_LEVEL_VERBOSE, "test");
        mSessionOptions = new Ort::SessionOptions;
        mSessionOptions->SetExecutionMode(ORT_SEQUENTIAL);
        mSessionOptions->DisableMemPattern();
        Ort::ThrowOnError(OrtSessionOptionsAppendExecutionProvider_DML(*mSessionOptions, 0));
        mSession = new Ort::Session((*mEnv), inModelPath, (*mSessionOptions));

When I run inference, it throws this warning :

2024-03-09 17:31:07.4370295 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-03-09 17:31:07.4474067 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.

I logged the run with verbose and some nodes are mapped into CPU :

2024-03-09 17:19:38.1964077 [V:onnxruntime:, session_state.cc:1146 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Node placements
2024-03-09 17:19:38.1987859 [V:onnxruntime:, session_state.cc:1152 onnxruntime::VerifyEachNodeIsAssignedToAnEp]  Node(s) placed on [DmlExecutionProvider]. Number of nodes: 77
2024-03-09 17:19:38.2023258 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   LSTM (/encoder.4/dconv/layers.0/layers.0.3/lstm/LSTM)
2024-03-09 17:19:38.2054732 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   LSTM (/encoder.4/dconv/layers.0/layers.0.3/lstm/LSTM_1)
2024-03-09 17:19:38.2087406 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Einsum (/encoder.4/dconv/layers.0/layers.0.4/Einsum)
2024-03-09 17:19:38.2130963 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Div (/encoder.4/dconv/layers.0/layers.0.4/Div)
2024-03-09 17:19:38.2163550 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Add (/encoder.4/dconv/layers.0/layers.0.4/Add)
2024-03-09 17:19:38.2193638 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Where (/encoder.4/dconv/layers.0/layers.0.4/Where_3)
2024-03-09 17:19:38.2225705 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Softmax (/encoder.4/dconv/layers.0/layers.0.4/Softmax)
2024-03-09 17:19:38.2268574 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Reshape (/encoder.4/dconv/layers.0/layers.0.4/Reshape_5)
2024-03-09 17:19:38.2329546 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Conv (/encoder.4/dconv/layers.0/layers.0.4/proj/Conv)
2024-03-09 17:19:38.2371909 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   LSTM (/encoder.4/dconv/layers.1/layers.1.3/lstm/LSTM)
2024-03-09 17:19:38.2404647 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   LSTM (/encoder.4/dconv/layers.1/layers.1.3/lstm/LSTM_1)
2024-03-09 17:19:38.2437037 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Einsum (/encoder.4/dconv/layers.1/layers.1.4/Einsum)
2024-03-09 17:19:38.2497929 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Div (/encoder.4/dconv/layers.1/layers.1.4/Div)
2024-03-09 17:19:38.2528811 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Add (/encoder.4/dconv/layers.1/layers.1.4/Add)
2024-03-09 17:19:38.2569731 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Where (/encoder.4/dconv/layers.1/layers.1.4/Where_3)
2024-03-09 17:19:38.2604761 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Softmax (/encoder.4/dconv/layers.1/layers.1.4/Softmax)
2024-03-09 17:19:38.2638485 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Reshape (/encoder.4/dconv/layers.1/layers.1.4/Reshape_5)
2024-03-09 17:19:38.2670311 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Conv (/encoder.4/dconv/layers.1/layers.1.4/proj/Conv)
2024-03-09 17:19:38.2702370 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   LSTM (/encoder.5/dconv/layers.0/layers.0.3/lstm/LSTM)
2024-03-09 17:19:38.2733169 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   LSTM (/encoder.5/dconv/layers.0/layers.0.3/lstm/LSTM_1)
2024-03-09 17:19:38.2770412 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Einsum (/encoder.5/dconv/layers.0/layers.0.4/Einsum)
2024-03-09 17:19:38.2805019 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Div (/encoder.5/dconv/layers.0/layers.0.4/Div)
2024-03-09 17:19:38.2834022 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Add (/encoder.5/dconv/layers.0/layers.0.4/Add)
2024-03-09 17:19:38.2946036 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Where (/encoder.5/dconv/layers.0/layers.0.4/Where_3)
2024-03-09 17:19:38.2978596 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Softmax (/encoder.5/dconv/layers.0/layers.0.4/Softmax)
2024-03-09 17:19:38.3011489 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Reshape (/encoder.5/dconv/layers.0/layers.0.4/Reshape_5)
2024-03-09 17:19:38.3043391 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Conv (/encoder.5/dconv/layers.0/layers.0.4/proj/Conv)
2024-03-09 17:19:38.3097470 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   LSTM (/encoder.5/dconv/layers.1/layers.1.3/lstm/LSTM)
2024-03-09 17:19:38.3128849 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   LSTM (/encoder.5/dconv/layers.1/layers.1.3/lstm/LSTM_1)
2024-03-09 17:19:38.3161082 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Einsum (/encoder.5/dconv/layers.1/layers.1.4/Einsum)
2024-03-09 17:19:38.3191046 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Div (/encoder.5/dconv/layers.1/layers.1.4/Div)
2024-03-09 17:19:38.3219691 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Add (/encoder.5/dconv/layers.1/layers.1.4/Add)
2024-03-09 17:19:38.3285357 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Where (/encoder.5/dconv/layers.1/layers.1.4/Where_3)
2024-03-09 17:19:38.3314117 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Softmax (/encoder.5/dconv/layers.1/layers.1.4/Softmax)
2024-03-09 17:19:38.3355328 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Reshape (/encoder.5/dconv/layers.1/layers.1.4/Reshape_5)
2024-03-09 17:19:38.3397122 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Conv (/encoder.5/dconv/layers.1/layers.1.4/proj/Conv)
2024-03-09 17:19:38.3429634 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   DmlFusedNode_0_3 (DmlFusedNode_0_3)
2024-03-09 17:19:38.3465694 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   DmlFusedNode_1_5 (DmlFusedNode_1_5)
2024-03-09 17:19:38.3491821 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   DmlFusedNode_2_7 (DmlFusedNode_2_7)
2024-03-09 17:19:38.3522404 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   DmlFusedNode_3_9 (DmlFusedNode_3_9)
2024-03-09 17:19:38.3549790 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   DmlFusedNode_4_10 (DmlFusedNode_4_10)
2024-03-09 17:19:38.3590378 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   DmlFusedNode_5_19 (DmlFusedNode_5_19)
2024-03-09 17:19:38.3619231 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   DmlFusedNode_6_21 (DmlFusedNode_6_21)
2024-03-09 17:19:38.3647269 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   DmlFusedNode_7_23 (DmlFusedNode_7_23)
2024-03-09 17:19:38.3675955 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   DmlFusedNode_8_25 (DmlFusedNode_8_25)
2024-03-09 17:19:38.3711477 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   DmlFusedNode_9_26 (DmlFusedNode_9_26)
2024-03-09 17:19:38.3737687 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   DmlFusedNode_10_35 (DmlFusedNode_10_35)
2024-03-09 17:19:38.3775225 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   DmlFusedNode_11_37 (DmlFusedNode_11_37)
2024-03-09 17:19:38.3801207 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   DmlFusedNode_12_39 (DmlFusedNode_12_39)
2024-03-09 17:19:38.3833644 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   DmlFusedNode_13_41 (DmlFusedNode_13_41)
2024-03-09 17:19:38.3930719 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   DmlFusedNode_14_42 (DmlFusedNode_14_42)
2024-03-09 17:19:38.3961912 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   DmlFusedNode_15_51 (DmlFusedNode_15_51)
2024-03-09 17:19:38.3993214 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   DmlFusedNode_16_53 (DmlFusedNode_16_53)
2024-03-09 17:19:38.4025940 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   DmlFusedNode_17_55 (DmlFusedNode_17_55)
2024-03-09 17:19:38.4061956 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   DmlFusedNode_18_57 (DmlFusedNode_18_57)
2024-03-09 17:19:38.4088969 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   DmlFusedNode_19_58 (DmlFusedNode_19_58)
2024-03-09 17:19:38.4125093 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   DmlFusedNode_20_67 (DmlFusedNode_20_67)
2024-03-09 17:19:38.4153944 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   MemcpyFromHost (Memcpy)
2024-03-09 17:19:38.4187650 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   MemcpyFromHost (Memcpy_token_752)
2024-03-09 17:19:38.4218997 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   MemcpyFromHost (Memcpy_token_753)
2024-03-09 17:19:38.4247973 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   MemcpyFromHost (Memcpy_token_754)
2024-03-09 17:19:38.4277472 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   MemcpyFromHost (Memcpy_token_755)
2024-03-09 17:19:38.4313953 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   MemcpyFromHost (Memcpy_token_756)
2024-03-09 17:19:38.4667697 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   MemcpyFromHost (Memcpy_token_757)
2024-03-09 17:19:38.5304254 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   MemcpyFromHost (Memcpy_token_758)
2024-03-09 17:19:38.5347031 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   MemcpyToHost (Memcpy_token_759)
2024-03-09 17:19:38.5379085 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   MemcpyToHost (Memcpy_token_760)
2024-03-09 17:19:38.5421093 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   MemcpyToHost (Memcpy_token_761)
2024-03-09 17:19:38.5465824 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   MemcpyToHost (Memcpy_token_762)
2024-03-09 17:19:38.5495877 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   MemcpyToHost (Memcpy_token_763)
2024-03-09 17:19:38.5537753 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   MemcpyToHost (Memcpy_token_764)
2024-03-09 17:19:38.5569797 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   MemcpyToHost (Memcpy_token_765)
2024-03-09 17:19:38.5606784 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   MemcpyToHost (Memcpy_token_766)
2024-03-09 17:19:38.5636080 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   MemcpyToHost (Memcpy_token_767)
2024-03-09 17:19:38.5666540 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   MemcpyToHost (Memcpy_token_768)
2024-03-09 17:19:38.5707909 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   MemcpyToHost (Memcpy_token_769)
2024-03-09 17:19:38.5738488 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   MemcpyToHost (Memcpy_token_770)
2024-03-09 17:19:38.5775146 [V:onnxruntime:, session_state.cc:1152 onnxruntime::VerifyEachNodeIsAssignedToAnEp]  Node(s) placed on [CPUExecutionProvider]. Number of nodes: 8
2024-03-09 17:19:38.5821310 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Einsum (/encoder.4/dconv/layers.0/layers.0.4/Einsum_1)
2024-03-09 17:19:38.5856312 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Einsum (/encoder.4/dconv/layers.0/layers.0.4/Einsum_2)
2024-03-09 17:19:38.5928337 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Einsum (/encoder.4/dconv/layers.1/layers.1.4/Einsum_1)
2024-03-09 17:19:38.5973647 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Einsum (/encoder.4/dconv/layers.1/layers.1.4/Einsum_2)
2024-03-09 17:19:38.6101708 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Einsum (/encoder.5/dconv/layers.0/layers.0.4/Einsum_1)
2024-03-09 17:19:38.6148915 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Einsum (/encoder.5/dconv/layers.0/layers.0.4/Einsum_2)
2024-03-09 17:19:38.6182724 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Einsum (/encoder.5/dconv/layers.1/layers.1.4/Einsum_1)
2024-03-09 17:19:38.6216925 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp]   Einsum (/encoder.5/dconv/layers.1/layers.1.4/Einsum_2)

Additional Info: CPU: Intel i9-12900H Integrated GPU : Intel(R) Iris(R) Xe Graphics GPU: RTX 3080Ti 16GB

To reproduce

Unfortunately I can't reveal the model. :(

Urgency

It is kinda urgent due to a project deadline..

Platform

Windows

OS Version

11

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.17.0

ONNX Runtime API

C++

Architecture

X64

Execution Provider

DirectML

Execution Provider Library Version

1.13.1

Model File

:(

Is this a quantized model?

No

pranavsharma commented 5 months ago

@fdwr can you take a look?

fdwr commented 5 months ago

@smk2007 @martinb35 (who are focused on this more so than I am now)

Unfortunately I can't reveal the model. :(

Without the model, it will be a challenge to isolate, and so we can only offer ideas on how you might be able to find it. Some approaches I take include:

(updated 2024-03-20)

AdarshAcharya5 commented 5 months ago

Hi @fdwr . Thanks for the reply!. How can I unregister ops in OperatorRegistration.cpp?. Do I just set DmlGraphSupport::Supported to DmlGraphSupport::NotSupported?

fdwr commented 5 months ago

Hi @fdwr . Thanks for the reply!. How can I unregister ops in OperatorRegistration.cpp?. Do I just set DmlGraphSupport::Supported to DmlGraphSupport::NotSupported?

Easiest is just to comment the potential lines // (you know which operators are found in your model).