WinML Implementation of ONNX Permute does not function

GreatA1exander commented 5 years ago

I'm submitting a…

Bug report (I searched for similar issues and did not find one)

Current behavior

When attempting to perform a Permutation that is allowable under the ONNX guidelines, WindowsML returns a false result on CPU and outright fails to run on GPU. Specifically, permutations with only one axis swap (akin to a simple transpose) work fine, while permutations with more than one axis swap fail outright.

Expected behavior

The expected behavior would be that WinML is able to perform permutations as described in the ONNX guidelines.

Minimal reproduction of the problem with instructions

I have attached two onnx files, one titled permute_net.onnx and one titled transpose_net.onnx. Simply run each network in the WinML dashboard and observe that the transpose one runs fine on CPU and GPU, while the permute_net file "Succeeds" on cpu and fails outright on GPU. Even the success results in faulty output.

onnx_files.zip

Here is the message received on a GPU run of permute_net.onnx: Creating session [FAILED] Exception during initialization: onecoreuap\windows\windowsai\winml\dll\mloperatorauthorimpl.cpp(1260)\Windows.AI.MachineLearning.dll!00007FFAF52E80A4: (caller: 00007FFAF52F4EBE) Exception(3) tid(3234) 80070057 The parameter is incorrect.

I have also run these networks through the actual WinML process in both a C++ and a C# app as described on the windows webpage, with similar failing results.

Environment

Windows Build Number:17763.379

App min and target version: Universal Windows, Windows 10, Version 1809.

OS Version (Server, IoT Core, Desktop, etc): Desktop

Graphics Driver version: 419.67 on Titan V.

DxDiag:

WinMLTools specific:

Source training framework: (e.g. CoreML, Scikit-learn, …), Torch, but the network is untrained.
- WinMLTools version, Not relevant.

Visual Studio

[ ] 2017 (version: )
[ ] 2017 Preview (version: )
[x] 2019 Preview (version: 16.0.0 Preview 4.3)

ryanlai2 commented 5 years ago

Hi @GreatA1exander thanks for reporting this issue. Can I get more clarity on your issue?

When you run "permute_net.onnx" on the CPU, what is the false result that you are obtaining?

Also can you try running your model(s) with the latest version of WinMLRunner tool ?

GreatA1exander commented 5 years ago

Hi @ryanlai2 ,

The false result is related to a subpixel convolution layer. When run through the CPU, instead of outputting the expected image composited of r^2 subimages, it mixes them in a non-expected manner, indicating that the permutation is not acting as it is supposed to.

Cdemu_192x192 output

Supposing that the top image was tiled 4 times, the permutation followed by a reshape should result in a large version of the image. Instead, the permute acts improperly and causes an output that looks like the second picture. When the full network is run on alternative onnx backends, the network acts as expected.

When running the permute net on the new version of the WinMLRunner tool, I receive this similar result:

C:#####\Downloads\WinMLRunner\x64_release>WinMLRunner.exe -model C:#####\source\repos\permute_net.onnx WinML Runner Printing available GPUs with DXGI.. Index: 0, Description: NVIDIA TITAN V

Loading model (path = C:######\permute_net.onnx)...

================================================================= Name: torch-jit-export Author: pytorch Version: 9223372036854775807 Domain: Description: Path: C:#####\permute_net.onnx Support FP16: false

Input Feature Info: Name: 0 Feature Kind: Float

Output Feature Info: Name: 1 Feature Kind: Float

=================================================================

Creating Session with CPU device Binding (device = CPU, iteration = 1, inputBinding = CPU, inputDataType = Tensor, deviceCreationLocation = WinML)...[SUCCESS] Evaluating (device = CPU, iteration = 1, inputBinding = CPU, inputDataType = Tensor, deviceCreationLocation = WinML)...[SUCCESS]

Creating Session with GPU: NVIDIA TITAN V Creating session [FAILED] Exception during initialization: onecoreuap\windows\windowsai\winml\dll\mloperatorauthorimpl.cpp(1260)\Windows.AI.MachineLearning.dll!00007FF8FBE380A4: (caller: 00007FF8FBE44EBE) Exception(3) tid(2c64) 80070057 The parameter is incorrect. Run failed for DeviceType: GPU

If there is anything else I can do to help clarify the problem, please don't hesitate to ask.

ryanlai2 commented 5 years ago

Hi @GreatA1exander , may I ask how you were able to obtain this image through WinML? Did you output tensors from WinMLRunner and parse that?

Also tensor dimensions > 5 aren't supported in WinML in the GPU path at the time. Are you able to tweak your model so that the dimension is <=5?

GreatA1exander commented 5 years ago

I was able to run the network using the WinML C++ and C# interface and then parsed the output tensors into an image and save that result in a similar manner to that of the samples available. The PixelShuffle layer requires tensors of size 5 as an intermediate step, so it is not possible to tweak the model accordingly without heavily affecting processing time. If the tensor dimension of 4 is a fundamental WinML design choice, then I guess that that is the problem at hand. It might be best to make that more obvious in the API/Tutorials somehow, as I was unaware of that restriction. Are there any plans to update WinML to work with highly dimensional tensors?

artths commented 4 years ago

https://github.com/onnx/models/tree/master/vision/super_resolution/sub_pixel_cnn_2016 This also doesn't work on GPU.

fdwr commented 3 years ago

@GreatA1exander : This should work in ORT 1.6. I tried your original micro-models from the first .zip file and didn't repro issue locally. We added support to most operators (some remain) in DirectML 1.4 (feature level >=3.0) from 1D up to 8D, including elementwise identity with transposed strides, which is how transpose is implemented. https://docs.microsoft.com/en-us/windows/win32/api/directml/ns-directml-dml_element_wise_identity_operator_desc#tensor-constraints

s:\WindowsAI\build\x64-win-redist-debug\install\bin>WinMLRunner.exe -gpu -model "D:\models\permute_net.onnx"

Created LearningModelDevice with GPU: NVIDIA Quadro P400
Loading model (path = D:\ai\models\broken\permute_net.onnx)...
=================================================================
Name: torch-jit-export
Author: pytorch
Version: 0
Domain:
Description:
Path: D:\ai\models\broken\permute_net.onnx
Support FP16: false

Input Feature Info:
Name: 0
Feature Kind: Float

Output Feature Info:
Name: 1
Feature Kind: Float

=================================================================

Binding (device = GPU, iteration = 1, inputBinding = CPU, inputDataType = Tensor, deviceCreationLocation = WinML)...[SUCCESS]
Evaluating (device = GPU, iteration = 1, inputBinding = CPU, inputDataType = Tensor, deviceCreationLocation = WinML)...[SUCCESS]

fdwr commented 3 years ago

p.s. The superres opset 10 model works too:

D:\models\super_resolution>onnx_test_runner.exe -e dml .
Disabling mem pattern and forcing single-threaded execution since DML is usedresult:
        Models: 1
        Total test cases: 1
                Succeeded: 1
                Not implemented: 0
                Failed: 0
        Stats by Operator type:
                Not implemented(0):
                Failed:
Failed Test Cases:

microsoft / Windows-Machine-Learning