DirectML Exception 80070057 "The parameter is incorrect"

TBrenetNV commented 2 months ago

Describe the issue

I exported the following PyTorch model: https://pytorch.org/hub/pytorch_vision_googlenet using TorchDynamo (see result ONNX model attached in next section) and can run inference using ONNX Runtime 1.17.3 with the CPU and CUDA provider but it fails with the DirectML provider. Full exception is the following:

C:\build\ort-1.17.3\onnxruntime\onnxruntime\core\providers\dml\DmlExecutionProvider\src\DmlGraphFusionHelper.cpp(451)\onnxruntime.dll!00007FFE11451461: (caller: 00007FFE114311D1) Exception(1) tid(ae54) 80070057 The parameter is incorrect.

I enabled DirectML debug layers but it did not provide more insights:

C:\__w\1\s\SharedValidation\GraphDescValidator.h(34)\DirectML.dll!00007FFE0FB010A5: (caller: 00007FFE0FAD9CC5) Exception(1) tid(ae54) 80070057 The parameter is incorrect.
Exception thrown at 0x00007FFF5CE4AB89 in onnxruntime_perf_test.exe: Microsoft C++ exception: wil::ResultException at memory location 0x0000004F310FB920.
Exception thrown at 0x00007FFF5CE4AB89 in onnxruntime_perf_test.exe: Microsoft C++ exception: [rethrow] at memory location 0x0000000000000000.
C:\__w\1\s\Product\DmlDevice.cpp(782)\DirectML.dll!00007FFE0FD5069E: (caller: 00007FFE11450F65) ReturnHr(1) tid(ae54) 80070057 The parameter is incorrect.
    Msg:[C:\__w\1\s\SharedValidation\GraphDescValidator.h(34)\DirectML.dll!00007FFE0FB010A5: (caller: 00007FFE0FAD9CC5) Exception(1) tid(ae54) 80070057 The parameter is incorrect.

To reproduce

To reproduce:

Unzip the attached ONNX Model: gnet_dynamo.zip
Run the onnxruntime_perf_test binary (built from source from branch 1.17.3) with the following arguments: onnxruntime_perf_test.exe -e dml -m times -r 5 -p profile_gnet_dynamo_dml.json -I gnet_dynamo.onnx

Urgency

No response

Platform

Windows

OS Version

10

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.17.3

ONNX Runtime API

C++

Architecture

X64

Execution Provider

Default CPU, CUDA, DirectML

Execution Provider Library Version

No response

whyb commented 2 months ago

I have the same problem on some old GPU devices(NVIDIA GeForce GTX 780). But this problem is not encountered on some newer graphics cards than the NVIDIA GeForce GTX 980.

Error Code:

RUNTIME_EXCEPTION

Error Message:

Non-zero status code returned while running Mul node. Name:'/G/encoder_level1/encoder_level1.0/norm1/body/Mul' Status Message: 
D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2449)\onnxruntime.dll!00007FFA2C65ADD5: (caller: 00007FFA2C65A468) Exception(6) tid(4764) 80070057 The parameter is incorrect.

Problem GPU Device: NVIDIA GeForce GTX 780 Driver Version: 471.11 (2021.6.23)

Platform: Windows OS Version: 10 ONNX Runtime Installation: Download from github release page ONNX Runtime Version or Commit ID: 1.16.23.1119 ONNX Runtime API: C++ Architecture: X64 Execution Provider: Default CPU, DirectML Execution Provider Library Version: DirectML.dll version: 1.13.1.0

siegelaaron94 commented 1 month ago

https://github.com/microsoft/onnxruntime/issues/20742

microsoft / onnxruntime