AccessViolation with DML provider in ConstantOfShape operator

DirectMLExample.zip

I have an example with a simple model, which fails with the DML provider, but succeeds with the CPU provider (change the bool variable in the example code to switch between the two). When inference runs, there's an invalid read with this C# stack trace:

System.AccessViolationException: 'Attempted to read or write protected memory...'
This exception was originally thrown at this call stack:
    Microsoft.ML.OnnxRuntime.InferenceSession.RunImpl(Microsoft.ML.OnnxRuntime.RunOptions, System.IntPtr[], System.IntPtr[], System.IntPtr[], Microsoft.ML.OnnxRuntime.DisposableList<System.IDisposable>) in InferenceSession.cs
    Microsoft.ML.OnnxRuntime.InferenceSession.Run(System.Collections.Generic.IReadOnlyCollection<Microsoft.ML.OnnxRuntime.NamedOnnxValue>, System.Collections.Generic.IReadOnlyCollection<string>, Microsoft.ML.OnnxRuntime.RunOptions) in InferenceSession.cs
    Microsoft.ML.OnnxRuntime.InferenceSession.Run(System.Collections.Generic.IReadOnlyCollection<Microsoft.ML.OnnxRuntime.NamedOnnxValue>, System.Collections.Generic.IReadOnlyCollection<string>) in InferenceSession.cs
    Microsoft.ML.OnnxRuntime.InferenceSession.Run(System.Collections.Generic.IReadOnlyCollection<Microsoft.ML.OnnxRuntime.NamedOnnxValue>) in InferenceSession.cs
    DirectMLExample.Program.Main(string[]) in Program.cs

I dug deeper in a C++ version which used a similar model (smallest repro model included in the .zip file attached to this issue) and a debug build of onnxruntime.dll. It's failing here

onnxruntime/core/providers/dml/DmlExecutionProvider/src/Operators/DmlOperatorConstantOfShape.cpp
     void Compute(const MLOperatorKernelContext& kernelContext) override
     {
        std::vector<IMLOperatorTensor*> outputTensors = GetOutputTensorsForExecute(kernelContext);
        THROW_IF_FAILED(m_executionProvider->FillTensorWithPattern(outputTensors.front(), valueBytes));

With the stack trace ("Exception thrown: read access violation. this was nullptr.)

onnxruntime.dll!Dml::AllocationInfo::GetOwner() Line 44
    at C:\Users\JamesGilmore\casserole\reference\out\build\x64-Debug\onnxruntime-prefix\src\onnxruntime\onnxruntime\core\providers\dml\DmlExecutionProvider\src\BucketizedBufferAllocator.h(44)
onnxruntime.dll!Dml::BucketizedBufferAllocator::DecodeDataHandle(const void * opaqueHandle) Line 212
    at C:\Users\JamesGilmore\casserole\reference\out\build\x64-Debug\onnxruntime-prefix\src\onnxruntime\onnxruntime\core\providers\dml\DmlExecutionProvider\src\BucketizedBufferAllocator.cpp(212)
onnxruntime.dll!Dml::ExecutionProviderImpl::FillTensorWithPattern(IMLOperatorTensor * dst, gsl::span<enum std::byte const> value) Line 447
    at C:\Users\JamesGilmore\casserole\reference\out\build\x64-Debug\onnxruntime-prefix\src\onnxruntime\onnxruntime\core\providers\dml\DmlExecutionProvider\src\ExecutionProvider.cpp(447)
onnxruntime.dll!Dml::DmlOperatorConstantOfShape::Compute(const MLOperatorKernelContext & kernelContext) Line 51
    at C:\Users\JamesGilmore\casserole\reference\out\build\x64-Debug\onnxruntime-prefix\src\onnxruntime\onnxruntime\core\providers\dml\DmlExecutionProvider\src\Operators\DmlOperatorConstantOfShape.cpp(51)
onnxruntime.dll!MLOperatorKernel<Dml::DmlOperatorConstantOfShape>::Compute(IMLOperatorKernelContext * context) Line 741
    at C:\Users\JamesGilmore\casserole\reference\out\build\x64-Debug\onnxruntime-prefix\src\onnxruntime\onnxruntime\core\providers\dml\OperatorAuthorHelper\MLOperatorAuthorHelper.h(741)
onnxruntime.dll!Windows::AI::MachineLearning::Adapter::AbiOpKernel::Compute(onnxruntime::OpKernelContext * context) Line 1638
    at C:\Users\JamesGilmore\casserole\reference\out\build\x64-Debug\onnxruntime-prefix\src\onnxruntime\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(1638)
onnxruntime.dll!onnxruntime::SequentialExecutor::Execute(const onnxruntime::SessionState & session_state, const std::vector<int,std::allocator<int>> & feed_mlvalue_idxs, const std::vector<OrtValue,std::allocator<OrtValue>> & feeds, const std::vector<int,std::allocator<int>> & fetch_mlvalue_idxs, std::vector<OrtValue,std::allocator<OrtValue>> & fetches, const std::unordered_map<unsigned __int64,std::function<onnxruntime::common::Status __cdecl(onnxruntime::TensorShape const &,OrtMemoryInfo const &,OrtValue &,bool &)>,std::hash<unsigned __int64>,std::equal_to<unsigned __int64>,std::allocator<std::pair<unsigned __int64 const ,std::function<onnxruntime::common::Status __cdecl(onnxruntime::TensorShape const &,OrtMemoryInfo const &,OrtValue &,bool &)>>>> & fetch_allocators, const onnxruntime::logging::Logger & logger) Line 316
    at C:\Users\JamesGilmore\casserole\reference\out\build\x64-Debug\onnxruntime-prefix\src\onnxruntime\onnxruntime\core\framework\sequential_executor.cc(316)
onnxruntime.dll!onnxruntime::utils::ExecuteGraphImpl(const onnxruntime::SessionState & session_state, const onnxruntime::FeedsFetchesManager & feeds_fetches_manager, const std::vector<OrtValue,std::allocator<OrtValue>> & feeds, std::vector<OrtValue,std::allocator<OrtValue>> & fetches, const std::unordered_map<unsigned __int64,std::function<onnxruntime::common::Status __cdecl(onnxruntime::TensorShape const &,OrtMemoryInfo const &,OrtValue &,bool &)>,std::hash<unsigned __int64>,std::equal_to<unsigned __int64>,std::allocator<std::pair<unsigned __int64 const ,std::function<onnxruntime::common::Status __cdecl(onnxruntime::TensorShape const &,OrtMemoryInfo const &,OrtValue &,bool &)>>>> & fetch_allocators, ExecutionMode execution_mode, const bool & terminate_flag, const onnxruntime::logging::Logger & logger, const bool only_execute_path_to_fetches) Line 560
    at C:\Users\JamesGilmore\casserole\reference\out\build\x64-Debug\onnxruntime-prefix\src\onnxruntime\onnxruntime\core\framework\utils.cc(560)
onnxruntime.dll!onnxruntime::utils::ExecuteSubgraph(const onnxruntime::SessionState & session_state, const onnxruntime::FeedsFetchesManager & feeds_fetches_manager, const std::vector<OrtValue,std::allocator<OrtValue>> & feeds, std::vector<OrtValue,std::allocator<OrtValue>> & fetches, const std::unordered_map<unsigned __int64,std::function<onnxruntime::common::Status __cdecl(onnxruntime::TensorShape const &,OrtMemoryInfo const &,OrtValue &,bool &)>,std::hash<unsigned __int64>,std::equal_to<unsigned __int64>,std::allocator<std::pair<unsigned __int64 const ,std::function<onnxruntime::common::Status __cdecl(onnxruntime::TensorShape const &,OrtMemoryInfo const &,OrtValue &,bool &)>>>> & fetch_allocators, ExecutionMode execution_mode, const bool & terminate_flag, const onnxruntime::logging::Logger & logger) Line 658
    at C:\Users\JamesGilmore\casserole\reference\out\build\x64-Debug\onnxruntime-prefix\src\onnxruntime\onnxruntime\core\framework\utils.cc(658)
onnxruntime.dll!onnxruntime::IfImpl::Execute(const onnxruntime::FeedsFetchesManager & ffm) Line 365
    at C:\Users\JamesGilmore\casserole\reference\out\build\x64-Debug\onnxruntime-prefix\src\onnxruntime\onnxruntime\core\providers\cpu\controlflow\if.cc(365)
onnxruntime.dll!onnxruntime::If::Compute(onnxruntime::OpKernelContext * ctx) Line 239
    at C:\Users\JamesGilmore\casserole\reference\out\build\x64-Debug\onnxruntime-prefix\src\onnxruntime\onnxruntime\core\providers\cpu\controlflow\if.cc(239)
onnxruntime.dll!onnxruntime::SequentialExecutor::Execute(const onnxruntime::SessionState & session_state, const std::vector<int,std::allocator<int>> & feed_mlvalue_idxs, const std::vector<OrtValue,std::allocator<OrtValue>> & feeds, const std::vector<int,std::allocator<int>> & fetch_mlvalue_idxs, std::vector<OrtValue,std::allocator<OrtValue>> & fetches, const std::unordered_map<unsigned __int64,std::function<onnxruntime::common::Status __cdecl(onnxruntime::TensorShape const &,OrtMemoryInfo const &,OrtValue &,bool &)>,std::hash<unsigned __int64>,std::equal_to<unsigned __int64>,std::allocator<std::pair<unsigned __int64 const ,std::function<onnxruntime::common::Status __cdecl(onnxruntime::TensorShape const &,OrtMemoryInfo const &,OrtValue &,bool &)>>>> & fetch_allocators, const onnxruntime::logging::Logger & logger) Line 316
    at C:\Users\JamesGilmore\casserole\reference\out\build\x64-Debug\onnxruntime-prefix\src\onnxruntime\onnxruntime\core\framework\sequential_executor.cc(316)
onnxruntime.dll!onnxruntime::utils::ExecuteGraphImpl(const onnxruntime::SessionState & session_state, const onnxruntime::FeedsFetchesManager & feeds_fetches_manager, const std::vector<OrtValue,std::allocator<OrtValue>> & feeds, std::vector<OrtValue,std::allocator<OrtValue>> & fetches, const std::unordered_map<unsigned __int64,std::function<onnxruntime::common::Status __cdecl(onnxruntime::TensorShape const &,OrtMemoryInfo const &,OrtValue &,bool &)>,std::hash<unsigned __int64>,std::equal_to<unsigned __int64>,std::allocator<std::pair<unsigned __int64 const ,std::function<onnxruntime::common::Status __cdecl(onnxruntime::TensorShape const &,OrtMemoryInfo const &,OrtValue &,bool &)>>>> & fetch_allocators, ExecutionMode execution_mode, const bool & terminate_flag, const onnxruntime::logging::Logger & logger, const bool only_execute_path_to_fetches) Line 525
    at C:\Users\JamesGilmore\casserole\reference\out\build\x64-Debug\onnxruntime-prefix\src\onnxruntime\onnxruntime\core\framework\utils.cc(525)
onnxruntime.dll!onnxruntime::utils::ExecuteGraph(const onnxruntime::SessionState & session_state, onnxruntime::FeedsFetchesManager & feeds_fetches_manager, const std::vector<OrtValue,std::allocator<OrtValue>> & feeds, std::vector<OrtValue,std::allocator<OrtValue>> & fetches, ExecutionMode execution_mode, const bool & terminate_flag, const onnxruntime::logging::Logger & logger, bool only_execute_path_to_fetches) Line 583
    at C:\Users\JamesGilmore\casserole\reference\out\build\x64-Debug\onnxruntime-prefix\src\onnxruntime\onnxruntime\core\framework\utils.cc(583)
onnxruntime.dll!onnxruntime::InferenceSession::Run(const OrtRunOptions & run_options, const std::vector<std::string,std::allocator<std::string>> & feed_names, const std::vector<OrtValue,std::allocator<OrtValue>> & feeds, const std::vector<std::string,std::allocator<std::string>> & output_names, std::vector<OrtValue,std::allocator<OrtValue>> * p_fetches, const std::vector<OrtDevice,std::allocator<OrtDevice>> * p_fetches_device_info) Line 1697
    at C:\Users\JamesGilmore\casserole\reference\out\build\x64-Debug\onnxruntime-prefix\src\onnxruntime\onnxruntime\core\session\inference_session.cc(1697)
onnxruntime.dll!OrtApis::Run(OrtSession * sess, const OrtRunOptions * run_options, const char * const * input_names, const OrtValue * const * input, unsigned __int64 input_len, const char * const * output_names1, unsigned __int64 output_names_len, OrtValue * * output) Line 590
    at C:\Users\JamesGilmore\casserole\reference\out\build\x64-Debug\onnxruntime-prefix\src\onnxruntime\onnxruntime\core\session\onnxruntime_c_api.cc(590)
reference.exe!Ort::Session::Run(const Ort::RunOptions & run_options, const char * const * input_names, const Ort::Value * input_values, unsigned __int64 input_count, const char * const * output_names, Ort::Value * output_values, unsigned __int64 output_count) Line 540
    at C:\Users\JamesGilmore\casserole\reference\out\build\x64-Debug\onnxruntime-install\include\onnxruntime\core\session\onnxruntime_cxx_inline.h(540)

I'm using version 1.9.20210921.7.4daa14b of onnxruntime.dll in the C#, and the C++ debug build the stack trace is from is built from git tag cba4bc11c78a55fa3aeb7c1490e8f9b387dceeec on the https://github.com/microsoft/onnxruntime.git repository.

If the problem isn't in this repo, do feel free to transfer this issue to microsoft/onnxruntime if that's more suitable. (I'm new to both)

microsoft / DirectML

AccessViolation with DML provider in ConstantOfShape operator #154