microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.16k stars 2.86k forks source link

[DML EP] out_of_range exception in Dml::GraphDescBuilder::BuildGraphDesc #17516

Open kazssym opened 1 year ago

kazssym commented 1 year ago

I got an out_of_range exception at the line below while trying to benchmark with onnxruntime.transformers.models.stable_diffusion.benchmark module.

https://github.com/microsoft/onnxruntime/blob/db558ef9b47894179346472592d145448574197a/onnxruntime/core/providers/dml/DmlExecutionProvider/src/GraphDescBuilder.cpp#L376

It looks graphNodeCreateInfo defined below was not filled with valid information but subGraphOutputArgNames has an element.

https://github.com/microsoft/onnxruntime/blob/db558ef9b47894179346472592d145448574197a/onnxruntime/core/providers/dml/DmlExecutionProvider/src/GraphDescBuilder.cpp#L247

Is it expected to be filled by the factory function?

sumitsays commented 1 year ago

@kazssym : DmlGraphNodeCreateInfo should not be null/empty and should also have the operator graph (kernel information) for a given node. nameToNodeAndIndexMap contains the node in a graph which will have the graph output emitting from it. So it is strange that it is throwing out_of_index exception. It is not expected. Is it possible for you to share the complete call stack? Also a small test model to investigate it further.

kazssym commented 1 year ago

The code here seems expected to fill graphNodeCreateInfo but it is not. nameToNodeAndIndexMap is not updated either.

https://github.com/microsoft/onnxruntime/blob/705f8a371886828dcd2a380d10ba3a6549a60b9b/onnxruntime/core/providers/dml/DmlExecutionProvider/src/AbiCustomRegistry.cpp#L484

kazssym commented 1 year ago

@kazssym : DmlGraphNodeCreateInfo should not null/empty and should also have the operator graph (kernel information) for a given node. nameToNodeAndIndexMap contains the node in a graph which will have the graph output emitting from it. So it is strange that it is throwing out_of_index exception. It is not expected. Is it possible for you to share the complete call stack? Also a small test model to investigate it further.

Here is a call stack.

KernelBase.dll!00007fffe415531c() (Unknown Source:0)
vcruntime140d.dll!00007fffd708b760() (Unknown Source:0)
msvcp140d.dll!00007fffa7c95459() (Unknown Source:0)
onnxruntime_pybind11_state.pyd!std::unordered_map<std::string,`Dml::GraphDescBuilder::BuildGraphDesc'::`2'::NodeAndIndex,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,`Dml::GraphDescBuilder::BuildGraphDesc'::`2'::NodeAndIndex>>>::at(const std::string & _Keyval) Line 448 (c:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.37.32822\include\unordered_map:448)
onnxruntime_pybind11_state.pyd!Dml::GraphDescBuilder::BuildGraphDesc(const unsigned char * isConstGpuGraphInput, const unsigned __int64 isConstGpuGraphInputCount, const std::unordered_map<std::string,std::pair<onnx::TensorProto const *,bool>,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::pair<onnx::TensorProto const *,bool>>>> & isInitializerTransferable, const onnxruntime::Graph & graph, const onnxruntime::IndexedSubGraph & indexedSubGraph, const std::unordered_map<std::string,Dml::GraphNodeProperties,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,Dml::GraphNodeProperties>>> & graphNodePropertyMap, IDMLDevice * device, const void * executionHandle) Line 376 (e:\onnxruntime\onnxruntime\core\providers\dml\DmlExecutionProvider\src\GraphDescBuilder.cpp:376)
onnxruntime_pybind11_state.pyd!Dml::DmlGraphFusionTransformer::ApplyImplHelper(onnxruntime::Graph & graph, bool & modified, int graph_level, const onnxruntime::logging::Logger & logger, const std::unordered_map<std::string,onnxruntime::NodeArg const *,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,onnxruntime::NodeArg const *>>> & implicitInputDefs) Line 195 (e:\onnxruntime\onnxruntime\core\providers\dml\DmlExecutionProvider\src\DmlGraphFusionTransformer.cpp:195)
onnxruntime_pybind11_state.pyd!Dml::DmlGraphFusionTransformer::ApplyImpl(onnxruntime::Graph & graph, bool & modified, int graph_level, const onnxruntime::logging::Logger & logger) Line 42 (e:\onnxruntime\onnxruntime\core\providers\dml\DmlExecutionProvider\src\DmlGraphFusionTransformer.cpp:42)
onnxruntime_pybind11_state.pyd!onnxruntime::GraphTransformer::Apply(onnxruntime::Graph & graph, bool & modified, const onnxruntime::logging::Logger & logger) Line 14 (e:\onnxruntime\onnxruntime\core\optimizer\graph_transformer.cc:14)
onnxruntime_pybind11_state.pyd!onnxruntime::GraphTransformerManager::ApplyTransformers(onnxruntime::Graph & graph, onnxruntime::TransformerLevel level, const onnxruntime::logging::Logger & logger) Line 36 (e:\onnxruntime\onnxruntime\core\optimizer\graph_transformer_mgr.cc:36)
onnxruntime_pybind11_state.pyd!onnxruntime::InferenceSession::TransformGraph(onnxruntime::Graph & graph, bool saving_model_in_ort_format) Line 1054 (e:\onnxruntime\onnxruntime\core\session\inference_session.cc:1054)
onnxruntime_pybind11_state.pyd!onnxruntime::InferenceSession::Initialize() Line 1564 (e:\onnxruntime\onnxruntime\core\session\inference_session.cc:1564)
onnxruntime_pybind11_state.pyd!onnxruntime::python::InitializeSession(onnxruntime::InferenceSession * sess, std::function<void __cdecl(onnxruntime::InferenceSession *,std::vector<std::string,std::allocator<std::string>> const &,std::unordered_map<std::string,std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>>>> const &)> ep_registration_fn, const std::vector<std::string,std::allocator<std::string>> & provider_types, const std::vector<std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>,std::allocator<std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>>> & provider_options, const std::unordered_set<std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::string>> & disabled_optimizer_names) Line 1050 (e:\onnxruntime\onnxruntime\python\onnxruntime_pybind_state.cc:1050)
onnxruntime_pybind11_state.pyd!onnxruntime::python::addObjectMethods::__l2::<lambda>(onnxruntime::python::PyInferenceSession * sess, const std::vector<std::string,std::allocator<std::string>> & provider_types, const std::vector<std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>,std::allocator<std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>>> & provider_options, const std::unordered_set<std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::string>> & disabled_optimizer_names) Line 1717 (e:\onnxruntime\onnxruntime\python\onnxruntime_pybind_state.cc:1717)
onnxruntime_pybind11_state.pyd!pybind11::detail::argument_loader<onnxruntime::python::PyInferenceSession *,std::vector<std::string,std::allocator<std::string>> const &,std::vector<std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>,std::allocator<std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>>> const &,std::unordered_set<std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::string>> const &>::call_impl<void,void <lambda>(onnxruntime::python::PyInferenceSession *, const std::vector<std::string,std::allocator<std::string>> &, const std::vector<std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>,std::allocator<std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>>> &, const std::unordered_set<std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::string>> &) &,0,1,2,3,pybind11::detail::void_type>(onnxruntime::python::addObjectMethods::__l2::void <lambda>(onnxruntime::python::PyInferenceSession *, const std::vector<std::string,std::allocator<std::string>> &, const std::vector<std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>,std::allocator<std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>>> &, const std::unordered_set<std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::string>> &) & f, std::integer_sequence<unsigned __int64,0,1,2,3> __formal, pybind11::detail::void_type && __formal) Line 1440 (e:\onnxruntime\build\Windows\Debug\_deps\pybind11_project-src\include\pybind11\cast.h:1440)
onnxruntime_pybind11_state.pyd!pybind11::detail::argument_loader<onnxruntime::python::PyInferenceSession *,std::vector<std::string,std::allocator<std::string>> const &,std::vector<std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>,std::allocator<std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>>> const &,std::unordered_set<std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::string>> const &>::call<void,pybind11::detail::void_type,void <lambda>(onnxruntime::python::PyInferenceSession *, const std::vector<std::string,std::allocator<std::string>> &, const std::vector<std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>,std::allocator<std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>>> &, const std::unordered_set<std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::string>> &) &>(onnxruntime::python::addObjectMethods::__l2::void <lambda>(onnxruntime::python::PyInferenceSession *, const std::vector<std::string,std::allocator<std::string>> &, const std::vector<std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>,std::allocator<std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>>> &, const std::unordered_set<std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::string>> &) & f) Line 1415 (e:\onnxruntime\build\Windows\Debug\_deps\pybind11_project-src\include\pybind11\cast.h:1415)
onnxruntime_pybind11_state.pyd!pybind11::cpp_function::initialize::__l2::<lambda>(pybind11::detail::function_call & call) Line 249 (e:\onnxruntime\build\Windows\Debug\_deps\pybind11_project-src\include\pybind11\pybind11.h:249)
onnxruntime_pybind11_state.pyd!pybind11::handle <lambda>(pybind11::detail::function_call &)::<lambda_invoker_cdecl>(pybind11::detail::function_call & call) Line 167 (e:\onnxruntime\build\Windows\Debug\_deps\pybind11_project-src\include\pybind11\pybind11.h:167)
onnxruntime_pybind11_state.pyd!pybind11::cpp_function::dispatcher(_object * self, _object * args_in, _object * kwargs_in) Line 929 (e:\onnxruntime\build\Windows\Debug\_deps\pybind11_project-src\include\pybind11\pybind11.h:929)
python310.dll!00007fff30229eea() (Unknown Source:0)
python310.dll!00007fff3026ffbb() (Unknown Source:0)
kazssym commented 1 year ago

DmlOperatorMemcpy::DmlOperatorMemcpy does never call SetDmlOperatorDesc?

sumitsays commented 1 year ago

@kazssym Thank you for sharing the call stack. It does look like nameToNodeAndIndexMap might not has an entry for an operator. As you have shared above, the operator might be Memcpy. Can you please share which exact version of Stable Diffusion model you are using and benchmarking? Or is it possible for you to share the script you are running, which I can run on my end to reproduce the issue?

kazssym commented 1 year ago

@kazssym Thank you for sharing the call stack. It does look like nameToNodeAndIndexMap might not has an entry for an operator. As you have shared above, the operator might be Memcpy. Can you please share which exact version of Stable Diffusion model you are using and benchmarking? Or is it possible for you to share the script you are running, which I can run on my end to reproduce the issue?

I am running the following command with https://huggingface.co/kazssym/stable-diffusion-2-1-optimized-fp16 on https://github.com/microsoft/onnxruntime/compare/main...kazssym:onnxruntime:dml-transformers-testing.

python -m onnxruntime.transformers.models.stable_diffusion.benchmark --provider dml --version 2.1 --pipeline stable-diffusion-2-1-optimized-fp16 --height 768 --width 768