Non-zero status code returned while running Resize node

adepierre commented 1 year ago

Describe the issue

Trying to run a ONNX model with a Resize layer with DML results in this error message:

[E:onnxruntime:, sequential_executor.cc:369 onnxruntime::SequentialExecutor::Execute] Non-zero status code returned while running Resize node. Name:'Resize_0' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(1866)\onnxruntime.dll!00007FFFDD662A90: (caller: 00007FFFDD661F93) Exception(3) tid(3728) 80070057 ParamDone

To reproduce

Model export with PyTorch :

import torch
from torch.nn import functional as F

class TestModel(torch.nn.Module):
    def forward(self, x):
        return F.interpolate(x, scale_factor=2, mode='nearest')

model = TestModel()
example = torch.rand(1, 3, 256, 256)
torch.onnx.export(model,
                    example,
                    'test.onnx',
                    input_names=['inputs'],
                    export_params=True,
                    dynamic_axes={'inputs': {2 : 'height', 3: 'width'}}
)

Running the model in C++ (tested with Microsoft.ML.OnnxRuntime.DirectML.1.13.1.zip)

#include <iostream>
#include <vector>
#include "onnxruntime_cxx_api.h"
#include "dml_provider_factory.h"

int main(int argc, char** argv)
{
#ifdef _WIN32
    std::string str = "test.onnx";
    std::wstring wide_string = std::wstring(str.begin(), str.end());
    std::basic_string<ORTCHAR_T> model_file = std::basic_string<ORTCHAR_T>(wide_string);
#else
    std::string model_file = "test.onnx";
#endif

    Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "test");
    Ort::SessionOptions session_options;
    OrtSessionOptionsAppendExecutionProvider_DML(session_options, 0);
    Ort::Session session = Ort::Session(env, model_file.data(), session_options);

    std::vector<int64_t> input_shape = { 1, 3, 196, 196 };
    std::vector<float> input_tensor_values(3 * 196 * 196, 0.0f);

    std::vector<Ort::Value> input_tensors;
    Ort::MemoryInfo memoryInfo = Ort::MemoryInfo::CreateCpu(OrtAllocatorType::OrtArenaAllocator, OrtMemType::OrtMemTypeDefault);
    input_tensors.push_back(Ort::Value::CreateTensor<float>(memoryInfo, input_tensor_values.data(), input_tensor_values.size(), input_shape.data(), input_shape.size()));

    std::vector<const char const*> input_names = { "inputs" };
    std::vector<const char const*> output_names = { "outputs" };

    std::vector<Ort::Value> outputs = session.Run(Ort::RunOptions{}, input_names.data(), input_tensors.data(), input_tensors.size(), output_names.data(), output_names.size());

    std::cout << outputs.size() << std::endl;
    std::cout << outputs[0].GetCount() << std::endl;
    return 0;
}

Urgency

Can't infer model with resize layers in DML, so it's quite blocking

Platform

Windows

OS Version

10

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.13.1

ONNX Runtime API

C++

Architecture

X64

Execution Provider

DirectML

Execution Provider Library Version

No response

adepierre commented 1 year ago

Quick update : I tried to replace the Resize op doing a quick "custom nearest interpolate function", but it leads to a similar issue, this time with Concat node 🙁 I think the root cause might be the dynamic size of the input, but I don't really know how to investigate more the issue and even less fix it

import torch

def resize(x: torch.Tensor, scale_factor:int):
    stacked_h = torch.stack([x]*scale_factor, dim=-1).view(*x.shape[:-1], scale_factor*x.shape[-1])
    stacked = torch.stack([stacked_h]*scale_factor, dim=-2).view(*x.shape[:-2], scale_factor*x.shape[-2], scale_factor*x.shape[-1])

    return stacked

class TestModel(torch.nn.Module):
    def forward(self, x):
        return resize(x, scale_factor=2)

model = TestModel()
example = torch.rand(1, 3, 256, 256)
torch.onnx.export(model,
                    example,
                    'test.onnx',
                    input_names=['inputs'],
                    export_params=True,
                    dynamic_axes={'inputs': {2 : 'height', 3: 'width'}}
)

[E:onnxruntime:, sequential_executor.cc:369 onnxruntime::SequentialExecutor::Execute] Non-zero status code returned while running Concat node. Name:'Concat_4' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(1866)\onnxruntime.dll

VladMVLX commented 1 year ago

Unfortunately having the same problem now in April 2023 :(

rodrigovimieiro commented 1 week ago

Same problem now in November 2024

microsoft / onnxruntime