microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.66k stars 2.93k forks source link

Crash in ResizeHelper::Initialize executing a model on ARM64 #18628

Open markhfeldman opened 11 months ago

markhfeldman commented 11 months ago

Porting an ORT C++ application from x64 to ARM64

Using a huggingspace runwayml\stable-diffusion-v1-5 model (http://huggingface.cp/runwayml/stable-diffusion-v1.5) optimized with MS Olive: (python stable_diffusion.py --optimize --model_id=runwayml/stable-diffusion-v1-5

Crash (using latest version of ORT) occurs with inferring this model: unet\model.onnx

ORT built with: build.bat --parallel --cmake_generator "Visual Studio 17 2022" --config=Debug --skip_tests --use-dml --arm64 --build_shared_lib

Seems to involve the up_blocks.0/upsamplers.0/Resize node

In ResizerHelper::Initialize the m_scales values are all 0 (0,0,0,0) which causes the crash (note: on x64 they are (1,1,2,2): float scale = m_scales[i]; ML_CHECK_VALID_ARGUMENT(scale > FLT_EPSILON, "Scale values should be positive.");

Full Call Stack is here:

onnxruntime.dll!OperatorHelper::ResizeHelper::Initialize(const OperatorHelper::IKernelInformationAdapter & kernelInformation, const OperatorHelper::IShapeInformationAdapter & shapeInformation, unsigned int opsetVersion) Line 2403 C++ onnxruntime.dll!OperatorHelper::ResizeHelper::ResizeHelper<MLShapeInferenceContext,MLShapeInferenceContext>(const MLShapeInferenceContext & info, const MLShapeInferenceContext & shape, unsigned int opsetVersion) Line 1291 C++ onnxruntime.dll!OperatorHelper::VersionedOpsetHelper<OperatorHelper::ResizeHelper,13>::VersionedOpsetHelper<OperatorHelper::ResizeHelper,13><MLShapeInferenceContext,MLShapeInferenceContext>(const MLShapeInferenceContext & info, const MLShapeInferenceContext & shape) Line 668 C++ onnxruntime.dll!OperatorHelper::ShapeInferenceFunction<OperatorHelper::VersionedOpsetHelper<OperatorHelper::ResizeHelper,13>>(IMLOperatorShapeInferenceContext inference_context) Line 1198 C++ onnxruntime.dll!MLOperatorShapeInferrer::InferOutputShapes(IMLOperatorShapeInferenceContext context) Line 992 C++ onnxruntime.dll!Windows::AI::MachineLearning::Adapter::InferAndVerifyOutputSizes(const onnxruntime::Node & node, const std::map<std::string,Windows::AI::MachineLearning::Adapter::AttributeValue,std::less,std::allocator<std::pair<std::string const ,Windows::AI::MachineLearning::Adapter::AttributeValue>>> defaultAttributes, IMLOperatorShapeInferrer shapeInferrer, gsl::span<unsigned int const ,-1> requiredConstantCpuInputs, std::function<std::variant<Microsoft::WRL::ComPtr,std::vector<Microsoft::WRL::ComPtr,std::allocator<Microsoft::WRL::ComPtr>>> cdecl(unsigned int)> & constantInputGetter, const Windows::AI::MachineLearning::Adapter::EdgeShapes * inputShapes, Windows::AI::MachineLearning::Adapter::EdgeShapes & outputShapes) Line 2759 C++ onnxruntime.dll!Windows::AI::MachineLearning::Adapter::AbiOpKernel::InferAndVerifyOutputSizes(gsl::span<unsigned int const ,-1> requiredConstantCpuInputs, std::function<std::variant<Microsoft::WRL::ComPtr,std::vector<Microsoft::WRL::ComPtr,std::allocator<Microsoft::WRL::ComPtr>>> cdecl(unsigned int)> & constantInputGetter, const Windows::AI::MachineLearning::Adapter::EdgeShapes inputShapes, Windows::AI::MachineLearning::Adapter::EdgeShapes & outputShapes) Line 2732 C++ onnxruntime.dll!Windows::AI::MachineLearning::Adapter::AbiOpKernel::Compute::__l2::(const Windows::AI::MachineLearning::Adapter::EdgeShapes & inputShapes, Windows::AI::MachineLearning::Adapter::EdgeShapes & outputShapes) Line 2435 C++ onnxruntime.dll!Windows::AI::MachineLearning::Adapter::AbiOpKernel::Compute(onnxruntime::OpKernelContext context) Line 2479 C++ onnxruntime.dll!onnxruntime::ExecuteKernel(onnxruntime::StreamExecutionContext & ctx, unsigned int64 idx, unsigned int64 stream_idx, const bool & terminate_flag, onnxruntime::SessionScope & session_scope) Line 493 C++ onnxruntime.dll!onnxruntime::LaunchKernelStep::Execute(onnxruntime::StreamExecutionContext & ctx, unsigned int64 stream_idx, onnxruntime::SessionScope & session_scope, const bool & terminate_flag, bool & continue_flag) Line 64 C++ onnxruntime.dll!onnxruntime::RunSince(unsigned int64 stream_idx, onnxruntime::StreamExecutionContext & ctx, onnxruntime::SessionScope & session_scope, const bool & terminate_flag, unsigned int64 since) Line 220 C++ onnxruntime.dll!onnxruntime::ScheduleDownstream::l9::() Line 256 C++ [External Code] onnxruntime.dll!onnxruntime::concurrency::ThreadPool::Schedule(onnxruntime::concurrency::ThreadPool tp, std::function<void cdecl(void)> fn) Line 233 C++ onnxruntime.dll!onnxruntime::ScheduleDownstream(onnxruntime::StreamExecutionContext & ctx, unsigned int64 trigger, bool single_thread_mode, const bool & terminate_flag, onnxruntime::SessionScope & session_scope) Line 258 C++ onnxruntime.dll!onnxruntime::TriggerDownstreamStep::Execute(onnxruntime::StreamExecutionContext & ctx, unsigned int64 formal, onnxruntime::SessionScope & session_scope, const bool & terminate_flag, bool & continue_flag) Line 105 C++ onnxruntime.dll!onnxruntime::RunSince(unsigned __int64 stream_idx, onnxruntime::StreamExecutionContext & ctx, onnxruntime::SessionScope & session_scope, const bool & terminate_flag, unsigned int64 since) Line 220 C++ onnxruntime.dll!onnxruntime::ExecuteThePlan::l19::() Line 587 C++ [External Code] onnxruntime.dll!onnxruntime::concurrency::ThreadPool::Schedule(onnxruntime::concurrency::ThreadPool tp, std::function<void cdecl(void)> fn) Line 233 C++ onnxruntime.dll!onnxruntime::ExecuteThePlan(const onnxruntime::SessionState & session_state, gsl::span<int const ,-1> feed_mlvalue_idxs, gsl::span<OrtValue const ,-1> feeds, gsl::span<int const ,-1> fetch_mlvalue_idxs, std::vector<OrtValue,std::allocator> & fetches, const std::unordered_map<unsigned int64,std::function<onnxruntime::common::Status cdecl(onnxruntime::TensorShape const &,OrtDevice const &,OrtValue &,bool &)>,std::hash<unsigned int64>,std::equal_to,std::allocator<std::pair<unsigned int64 const ,std::function<onnxruntime::common::Status cdecl(onnxruntime::TensorShape const &,OrtDevice const &,OrtValue &,bool &)>>>> & fetch_allocators, const onnxruntime::logging::Logger & logger, const onnxruntime::DeviceStreamCollection device_streams, const bool & terminate_flag, const bool only_execute_path_to_fetches, bool single_thread_mode) Line 590 C++ onnxruntime.dll!onnxruntime::utils::ExecuteGraphImpl(const onnxruntime::SessionState & session_state, const onnxruntime::FeedsFetchesManager & feeds_fetches_manager, gsl::span<OrtValue const ,-1> feeds, std::vector<OrtValue,std::allocator> & fetches, const std::unordered_map<unsigned int64,std::function<onnxruntime::common::Status cdecl(onnxruntime::TensorShape const &,OrtDevice const &,OrtValue &,bool &)>,std::hash,std::equal_to,std::allocator<std::pair<unsigned int64 const ,std::function<onnxruntime::common::Status cdecl(onnxruntime::TensorShape const &,OrtDevice const &,OrtValue &,bool &)>>>> & fetch_allocators, ExecutionMode execution_mode, const bool & terminate_flag, const onnxruntime::logging::Logger & logger, onnxruntime::DeviceStreamCollection device_stream_collection, const bool only_execute_path_to_fetches, onnxruntime::Stream parent_stream) Line 693 C++ onnxruntime.dll!onnxruntime::utils::ExecuteGraph(const onnxruntime::SessionState & session_state, onnxruntime::FeedsFetchesManager & feeds_fetches_manager, gsl::span<OrtValue const ,-1> feeds, std::vector<OrtValue,std::allocator> & fetches, ExecutionMode execution_mode, const bool & terminate_flag, const onnxruntime::logging::Logger & logger, onnxruntime::DeviceStreamCollectionHolder & device_stream_collection_holder, bool only_execute_path_to_fetches, onnxruntime::Stream parent_stream) Line 747 C++ onnxruntime.dll!onnxruntime::utils::ExecuteGraph(const onnxruntime::SessionState & session_state, onnxruntime::FeedsFetchesManager & feeds_fetches_manager, gsl::span<OrtValue const ,-1> feeds, std::vector<OrtValue,std::allocator> & fetches, ExecutionMode execution_mode, const OrtRunOptions & run_options, onnxruntime::DeviceStreamCollectionHolder & device_stream_collection_holder, const onnxruntime::logging::Logger & logger) Line 769 C++ onnxruntime.dll!onnxruntime::InferenceSession::Run(const OrtRunOptions & run_options, gsl::span<std::string const ,-1> feed_names, gsl::span<OrtValue const ,-1> feeds, gsl::span<std::string const ,-1> output_names, std::vector<OrtValue,std::allocator> p_fetches, const std::vector<OrtDevice,std::allocator> p_fetches_device_info) Line 2358 C++ onnxruntime.dll!onnxruntime::InferenceSession::Run(const OrtRunOptions & run_options, onnxruntime::IOBinding & io_binding) Line 2632 C++ onnxruntime.dll!OrtApis::RunWithBinding(OrtSession sess, const OrtRunOptions run_options, const OrtIoBinding * binding_ptr) Line 889 C++ StableDiffusion.exe!Ort::detail::SessionImpl::Run(const Ort::RunOptions & run_options, const Ort::IoBinding & io_binding) Line 981 C++

linnealovespie commented 5 months ago

@markhfeldman DirectML Resize operator was updated in #19071, could you update to ORT 1.18 and see if this bug still repros? If it does could you please include device info and if possible, more isolated steps to reproduce the bug. Thanks!