microsoft / DirectML

DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. DirectML provides GPU acceleration for common machine learning tasks across a broad range of supported hardware and drivers, including all DirectX 12-capable GPUs from vendors such as AMD, Intel, NVIDIA, and Qualcomm.
MIT License
2.21k stars 293 forks source link

few onnx models fail to run via dx-disptach #276

Open ankan-ban opened 2 years ago

ankan-ban commented 2 years ago

The new functionality of dx-dispatch to quickly benchmark onnx model is very useful. However, it doesn't seem to work with some of the models.

Here are some examples from onnxzoo:

Model path: https://github.com/onnx/models/tree/main/vision/object_detection_segmentation/yolov4/model

>dxdispatch.exe -i 1000  -f unk__2104:1 yolov4.onnx
Running on 'NVIDIA GeForce RTX 3080 '
2022-08-09 15:50:35.6481373 [E:onnxruntime:, sequential_executor.cc:364 onnxruntime::SequentialExecutor::Execute] Non-zero status code returned while running Concat node. Name:'StatefulPartitionedCall/model/tf_op_layer_concat_10/concat_10' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(1758)\onnxruntime.dll!00007FFF60FA6064: (caller: 00007FFF613F655E) Exception(2) tid(61ac) 8007023E {Application Error}
The exception %s (0x
Failed to execute dispatchable: Non-zero status code returned while running Concat node. Name:'StatefulPartitionedCall/model/tf_op_layer_concat_10/concat_10' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(1758)\onnxruntime.dll!00007FFF60FA6064: (caller: 00007FFF613F655E) Exception(2) tid(61ac) 8007023E {Application Error}
The exception %s (0x

Model path: https://github.com/onnx/models/tree/main/vision/object_detection_segmentation/ssd/model

>dxdispatch.exe -i 1000  ssd-12.onnx
...
2022-08-09 16:05:18.4763276 [W:onnxruntime:, graph.cc:3559 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'backbone.model.layer2.0.4.bn1.num_batches_tracked'. It is not used by any node and should be removed from the model.
2022-08-09 16:05:18.4793944 [W:onnxruntime:, graph.cc:3559 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'backbone.model.layer2.0.4.bn2.num_batches_tracked'. It is not used by any node and should be removed from the model.
2022-08-09 16:05:18.6571438 [E:onnxruntime:, sequential_executor.cc:364 onnxruntime::SequentialExecutor::Execute] Non-zero status code returned while running Unsqueeze node. Name:'Unsqueeze_scores' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(1758)\onnxruntime.dll!00007FFF5FAC6064: (caller: 00007FFF5FF1655E) Exception(2) tid(5364) 8007023E {Application Error}
The exception %s (0x
Failed to execute dispatchable: Non-zero status code returned while running Unsqueeze node. Name:'Unsqueeze_scores' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(1758)\onnxruntime.dll!00007FFF5FAC6064: (caller: 00007FFF5FF1655E) Exception(2) tid(5364) 8007023E {Application Error}
The exception %s (0x

Model path: https://github.com/onnx/models/tree/main/vision/classification/mobilenet/model For this one batch_size:1 works

>dxdispatch.exe -i 1000 -f batch_size:2 E:\Work\FullCustomerModels\onnxzoo\fp32\mobilenetv2-12.onnx
Running on 'NVIDIA GeForce RTX 3080 '
2022-08-09 16:11:21.0327210 [E:onnxruntime:, sequential_executor.cc:364 onnxruntime::SequentialExecutor::Execute] Non-zero status code returned while running Gemm node. Name:'Gemm_104' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(1758)\onnxruntime.dll!00007FFF5F6B6064: (caller: 00007FFF5FB0655E) Exception(2) tid(5704) 8007023E {Application Error}
The exception %s (0x
Failed to execute dispatchable: Non-zero status code returned while running Gemm node. Name:'Gemm_104' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(1758)\onnxruntime.dll!00007FFF5F6B6064: (caller: 00007FFF5FB0655E) Exception(2) tid(5704) 8007023E {Application Error}
The exception %s (0x
jstoecker commented 2 years ago

Thanks, Ankan. I'll file a bug and see what's going on here.

jstoecker commented 2 years ago

Should be fixed in latest version!