microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.62k stars 2.92k forks source link

[Performance] Why is dynamic shape not supported with the CoreML provider, while CoreML 2+ supports it ? #14212

Closed divideconcept closed 1 year ago

divideconcept commented 1 year ago

Describe the issue

When I try to run an ONNX model with a dynamic shape (even on a single axis) on the CoreML backend of ORT 1.13.1, on a recent machine (macbook air M1 with macOS 12.5), I get the following warning: [W:onnxruntime:, helper.cc:61 IsInputSupported] Dynamic shape is not supported for now, for input:input

And as a result the ONNX runs on the CPU instead of CoreML.

Why can't ORT runs ONNX models with dynamic shape on the CoreML backend ? According to https://apple.github.io/coremltools/mlmodel/Format/Model.html CoreML supports dynamic shapes since version 2, which was released with macOS 10.14, more than 4 years ago.

To reproduce

Use ORT 1.13.1 macos arm64 on a C++ project running in arm64 mode. Try to infer an ONNX model with at least one axis dynamic, and the warning will show up and the model inference will fallback to CPU.

Urgency

No response

Platform

Mac

OS Version

12.5

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.13.1

ONNX Runtime API

C++

Architecture

ARM64

Execution Provider

CoreML

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

No

skottmckay commented 1 year ago

Isn't that limited to layers with 'Dynamic' in the name, which aren't many. e.g. FillDynamicLayer, BroadcastToDynamicLayer

https://apple.github.io/coremltools/mlmodel/Format/NeuralNetwork.html

Do you have a production use case where you'd have completely dynamic sizes being used throughout the model? e.g. for an image model you'd typically resize to a fixed size so the vast majority of the model uses fixed sizes.

divideconcept commented 1 year ago

No it's supposed to work with any network, at least networks using convolutions: https://coremltools.readme.io/docs/flexible-inputs

In my use case (audio processing) I adapt the input shape based both on the rendering type (real-time preview : smaller input shape for a more dynamic feedback, offline render: longer input shape to reduce borders artefacts) and sample rate (to not tile excessively where the spectrum contains nothing).

skottmckay commented 1 year ago

We'll take a look and see what's possible. Would you be looking to use a bounded range?

divideconcept commented 1 year ago

From what I understand, CoreML allows 3 approaches for dynamic shapes: -a set of predetermined shapes -bounded ranges -unbounded ranges

Support for any of these mode would be a significant improvement over the single fixed shape situation we have currently. I don't have any personal preference as long as I can at least play with a couple different shapes.

I suppose the easiest would be to implement the unbounded range mode, which is a straightforward application of the ONNX's unbounded dimensions.

ggordonhall commented 1 year ago

Seconding the above. We're trying to run a transformer sentence embedding model using the CoreML Execution Provider. Without support for dynamic shapes we have to pad each sequence in a batch to the maximum model sequence length.

laclouis5 commented 1 year ago

I also second the above. I'm trying to run a Segment Anything Model (SAM) and got the same issue. The SAM I'm using seems to use a bounded dynamic input shape which is caped to 1024 pixels on the the longest side.

skottmckay commented 1 year ago

As you able to share the model? We'll look at implementing something in the 1.16 release and it would be good to have a specific model to test against.

laclouis5 commented 1 year ago

Yes, sure. It's the SAM models (B, L, and H variants) from the AnyLabeling repo. Here is a direct link to the image encoder of the ONNX file that is probably the cause of the issue. There is also the decoder that may use dynamic shape.

divideconcept commented 1 year ago

@skottmckay has there been any progress on this ? I saw https://github.com/microsoft/onnxruntime/pull/15993/ but it doesn't seem to remove this warning : https://github.com/microsoft/onnxruntime/blob/9206b7cdc61df811ace631c25428fdf7f1a1b687/onnxruntime/core/providers/coreml/builders/helper.cc#L66

skottmckay commented 1 year ago

That PR was to prevent bailing out on considering nodes for assignment to CoreML as soon as any node with a dynamic input shape was seen. If there are other nodes in the model with fixed shapes CoreML can still be used for them. e.g. if the model supports a dynamic image size, after the Resize of the image to a fixed size the rest of the model can use CoreML.

We're looking to add dynamic shape support in the next release - 1.16.

edgchen1 commented 1 year ago

Added initial dynamic shape support to ORT CoreML EP using unbounded ranges in #16915. Please try again and let us know if you run into issues.

laclouis5 commented 1 year ago

Unfortunately, I wasn't able to compile ORT properly on my Mac M1. I need to build the Python wheel since my application uses the CoreML EP through the Python API but the build results in an error (the normal build without wheel build seems to work fine though).

I installed cmake and the dev dependences (Python 3.10), then executed:

./build.sh --config RelWithDebInfo \
  --build_shared_lib \
  --parallel \
  --compile_no_warning_as_error \
  --skip_submodule_sync \
  --cmake_extra_defines \
  CMAKE_OSX_ARCHITECTURES=arm64 \
  --use_coreml \
  --minimal_build extended \
  --build_wheel

Is there something I'm missing?

Here is the interesting parts of the error message ``` /Users/louislac/Developer/onnxruntime/onnxruntime/python/onnxruntime_pybind_schema.cc:15:76: error: no member named 'OpSchema' in namespace 'onnx' "get_all_operator_schema", []() -> const std::vector { ~~~~~~~~~~~~~~~~^ /Users/louislac/Developer/onnxruntime/onnxruntime/python/onnxruntime_pybind_schema.cc:16:32: error: no member named 'OpSchemaRegistry' in namespace 'onnx' return ONNX_NAMESPACE::OpSchemaRegistry::get_all_schemas_with_history(); ~~~~~~~~~~~~~~~~^ /Users/louislac/Developer/onnxruntime/onnxruntime/python/onnxruntime_pybind_schema.cc:126:30: error: no member named 'OpSchema' in namespace 'onnx' py::class_ op_schema(schemadef, "OpSchema", py::module_local()); ~~~~~~~~~~~~~~~~^ /Users/louislac/Developer/onnxruntime/onnxruntime/python/onnxruntime_pybind_schema.cc:127:60: error: no member named 'OpSchema' in namespace 'onnx' op_schema.def_property_readonly("file", &ONNX_NAMESPACE::OpSchema::file) ~~~~~~~~~~~~~~~~^ /Users/louislac/Developer/onnxruntime/onnxruntime/python/onnxruntime_pybind_schema.cc:128:55: error: no member named 'OpSchema' in namespace 'onnx' .def_property_readonly("line", &ONNX_NAMESPACE::OpSchema::line) ~~~~~~~~~~~~~~~~^ /Users/louislac/Developer/onnxruntime/onnxruntime/python/onnxruntime_pybind_schema.cc:129:64: error: no member named 'OpSchema' in namespace 'onnx' .def_property_readonly("support_level", &ONNX_NAMESPACE::OpSchema::support_level) ~~~~~~~~~~~~~~~~^ /Users/louislac/Developer/onnxruntime/onnxruntime/python/onnxruntime_pybind_schema.cc:131:35: error: no member named 'OpSchema' in namespace 'onnx' "doc", &ONNX_NAMESPACE::OpSchema::doc, py::return_value_policy::reference) ~~~~~~~~~~~~~~~~^ /Users/louislac/Developer/onnxruntime/onnxruntime/python/onnxruntime_pybind_schema.cc:132:64: error: no member named 'OpSchema' in namespace 'onnx' .def_property_readonly("since_version", &ONNX_NAMESPACE::OpSchema::since_version) ~~~~~~~~~~~~~~~~^ /Users/louislac/Developer/onnxruntime/onnxruntime/python/onnxruntime_pybind_schema.cc:133:61: error: no member named 'OpSchema' in namespace 'onnx' .def_property_readonly("deprecated", &ONNX_NAMESPACE::OpSchema::deprecated) ~~~~~~~~~~~~~~~~^ /Users/louislac/Developer/onnxruntime/onnxruntime/python/onnxruntime_pybind_schema.cc:134:57: error: no member named 'OpSchema' in namespace 'onnx' .def_property_readonly("domain", &ONNX_NAMESPACE::OpSchema::domain) ~~~~~~~~~~~~~~~~^ /Users/louislac/Developer/onnxruntime/onnxruntime/python/onnxruntime_pybind_schema.cc:135:55: error: no member named 'OpSchema' in namespace 'onnx' .def_property_readonly("name", &ONNX_NAMESPACE::OpSchema::Name) ~~~~~~~~~~~~~~~~^ /Users/louislac/Developer/onnxruntime/onnxruntime/python/onnxruntime_pybind_schema.cc:136:60: error: no member named 'OpSchema' in namespace 'onnx' .def_property_readonly("min_input", &ONNX_NAMESPACE::OpSchema::min_input) ~~~~~~~~~~~~~~~~^ /Users/louislac/Developer/onnxruntime/onnxruntime/python/onnxruntime_pybind_schema.cc:137:60: error: no member named 'OpSchema' in namespace 'onnx' .def_property_readonly("max_input", &ONNX_NAMESPACE::OpSchema::max_input) ~~~~~~~~~~~~~~~~^ [ 97%] Building CXX object CMakeFiles/onnxruntime_test_all.dir/Users/louislac/Developer/onnxruntime/onnxruntime/test/common/utf8_util_test.cc.o /Users/louislac/Developer/onnxruntime/onnxruntime/python/onnxruntime_pybind_schema.cc:138:61: error: no member named 'OpSchema' in namespace 'onnx' .def_property_readonly("min_output", &ONNX_NAMESPACE::OpSchema::min_output) ~~~~~~~~~~~~~~~~^ /Users/louislac/Developer/onnxruntime/onnxruntime/python/onnxruntime_pybind_schema.cc:139:61: error: no member named 'OpSchema' in namespace 'onnx' .def_property_readonly("max_output", &ONNX_NAMESPACE::OpSchema::max_output) ~~~~~~~~~~~~~~~~^ /Users/louislac/Developer/onnxruntime/onnxruntime/python/onnxruntime_pybind_schema.cc:140:61: error: no member named 'OpSchema' in namespace 'onnx' .def_property_readonly("attributes", &ONNX_NAMESPACE::OpSchema::attributes) ~~~~~~~~~~~~~~~~^ /Users/louislac/Developer/onnxruntime/onnxruntime/python/onnxruntime_pybind_schema.cc:141:57: error: no member named 'OpSchema' in namespace 'onnx' .def_property_readonly("inputs", &ONNX_NAMESPACE::OpSchema::inputs) ~~~~~~~~~~~~~~~~^ /Users/louislac/Developer/onnxruntime/onnxruntime/python/onnxruntime_pybind_schema.cc:142:58: error: no member named 'OpSchema' in namespace 'onnx' .def_property_readonly("outputs", &ONNX_NAMESPACE::OpSchema::outputs) ~~~~~~~~~~~~~~~~^ /Users/louislac/Developer/onnxruntime/onnxruntime/python/onnxruntime_pybind_schema.cc:145:28: error: no member named 'OpSchema' in namespace 'onnx' &ONNX_NAMESPACE::OpSchema::has_type_and_shape_inference_function) ~~~~~~~~~~~~~~~~^ fatal error: too many errors emitted, stopping now [-ferror-limit=] 20 errors generated. make[2]: *** [CMakeFiles/onnxruntime_pybind11_state.dir/Users/louislac/Developer/onnxruntime/onnxruntime/python/onnxruntime_pybind_schema.cc.o] Error 1 make[2]: *** Waiting for unfinished jobs.... [ 97%] Building CXX object CMakeFiles/onnxruntime_test_all.dir/Users/louislac/Developer/onnxruntime/onnxruntime/test/optimizer/runtime_optimization/graph_runtime_optimization_test.cc.o [ 98%] Building CXX object CMakeFiles/onnxruntime_test_all.dir/Users/louislac/Developer/onnxruntime/onnxruntime/test/framework/ort_model_only_test.cc.o [ 98%] Building CXX object CMakeFiles/onnxruntime_test_all.dir/Users/louislac/Developer/onnxruntime/onnxruntime/test/platform/barrier_test.cc.o [ 98%] Building CXX object CMakeFiles/onnxruntime_test_all.dir/Users/louislac/Developer/onnxruntime/onnxruntime/test/platform/env_test.cc.o [ 98%] Linking CXX executable onnxruntime_shared_lib_test [ 98%] Built target onnxruntime_shared_lib_test [ 98%] Building CXX object CMakeFiles/onnxruntime_test_all.dir/Users/louislac/Developer/onnxruntime/onnxruntime/test/platform/file_io_test.cc.o [ 98%] Building CXX object CMakeFiles/onnxruntime_test_all.dir/Users/louislac/Developer/onnxruntime/onnxruntime/test/platform/path_lib_test.cc.o [ 98%] Building CXX object CMakeFiles/onnxruntime_test_all.dir/Users/louislac/Developer/onnxruntime/onnxruntime/test/platform/scoped_resource_test.cc.o [ 98%] Building CXX object CMakeFiles/onnxruntime_test_all.dir/Users/louislac/Developer/onnxruntime/onnxruntime/test/platform/threadpool_test.cc.o [ 98%] Building CXX object CMakeFiles/onnxruntime_test_all.dir/Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/coreml/coreml_basic_test.cc.o [ 98%] Building CXX object CMakeFiles/onnxruntime_test_all.dir/Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/coreml/dynamic_input_test.cc.o make[1]: *** [CMakeFiles/onnxruntime_pybind11_state.dir/all] Error 2 make[1]: *** Waiting for unfinished jobs.... [ 98%] Building CXX object CMakeFiles/onnxruntime_test_all.dir/Users/louislac/Developer/onnxruntime/onnxruntime/test/framework/test_utils.cc.o [ 98%] Building CXX object CMakeFiles/onnxruntime_test_all.dir/Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/internal_testing/internal_testing_ep_static_kernels.cc.o [100%] Building CXX object CMakeFiles/onnxruntime_test_all.dir/Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/internal_testing/internal_testing_execution_provider.cc.o [100%] Building CXX object CMakeFiles/onnxruntime_test_all.dir/Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/internal_testing/internal_testing_partitioning_tests.cc.o In file included from /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/coreml/dynamic_input_test.cc:12: In file included from /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/model_tester.h:10: /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/base_tester.h:549:35: error: no type named 'ResolveOptions' in 'onnxruntime::Graph' BaseTester& Config(const Graph::ResolveOptions& resolve_options); ~~~~~~~^ /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/base_tester.h:560:25: error: no type named 'ResolveOptions' in 'onnxruntime::Graph' const Graph::ResolveOptions& resolve_options = {}); ~~~~~~~^ /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/base_tester.h:570:25: error: no type named 'ResolveOptions' in 'onnxruntime::Graph' const Graph::ResolveOptions& resolve_options = {}, ~~~~~~~^ /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/base_tester.h:632:12: error: no type named 'ResolveOptions' in 'onnxruntime::Graph' Graph::ResolveOptions resolve_options{}; ~~~~~~~^ /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/base_tester.h:38:27: error: no member named 'OpSchemaRegistry' in namespace 'onnx' ONNX_NAMESPACE::OpSchemaRegistry::DomainToVersionRange().Map().at(ONNX_NAMESPACE::ONNX_DOMAIN).second; ~~~~~~~~~~~~~~~~^ /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/base_tester.h:38:93: error: no member named 'ONNX_DOMAIN' in namespace 'onnx' ONNX_NAMESPACE::OpSchemaRegistry::DomainToVersionRange().Map().at(ONNX_NAMESPACE::ONNX_DOMAIN).second; ~~~~~~~~~~~~~~~~^ /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/base_tester.h:518:51: error: no member named 'GetOpschemaRegistry' in 'onnxruntime::CustomRegistry' custom_schema_registries_.push_back(registry->GetOpschemaRegistry()); ~~~~~~~~~~^ /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/base_tester.h:807:29: error: no member named 'UnionShapeInfo' in namespace 'onnx' ONNX_NAMESPACE::UnionShapeInfo(shape_proto, *output_tensor_type); ~~~~~~~~~~~~~~~~^ In file included from /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/coreml/dynamic_input_test.cc:12: /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/model_tester.h:40:29: error: no member named 'Load' in 'onnxruntime::Model' ASSERT_STATUS_OK(Model::Load(model_uri_, model_, nullptr, DefaultLoggingManager().DefaultLogger(), model_options)); ~~~~~~~^ /Users/louislac/Developer/onnxruntime/onnxruntime/test/util/include/asserts.h:14:27: note: expanded from macro 'ASSERT_STATUS_OK' Status _tmp_status = (function); \ ^~~~~~~~ /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/coreml/dynamic_input_test.cc:83:10: error: no matching member function for call to 'AddInput' tester.AddInput("A", {0, 2}, {}); ~~~~~~~^~~~~~~~~~~~~~~ /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/base_tester.h:59:8: note: candidate function template not viable: no known conversion from 'onnxruntime::test::ModelTester' to 'onnxruntime::test::BaseTester' for object argument void AddInput(const char* name, std::initializer_list dims, std::initializer_list values, ^ /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/base_tester.h:66:8: note: candidate function template not viable: no known conversion from 'onnxruntime::test::ModelTester' to 'onnxruntime::test::BaseTester' for object argument void AddInput(const char* name, std::initializer_list dims, const std::vector& values, ^ /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/base_tester.h:81:8: note: candidate function template not viable: no known conversion from 'onnxruntime::test::ModelTester' to 'onnxruntime::test::BaseTester' for object argument void AddInput(const char* name, std::initializer_list dims, gsl::span values, ^ /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/base_tester.h:88:8: note: candidate function template not viable: no known conversion from 'onnxruntime::test::ModelTester' to 'onnxruntime::test::BaseTester' for object argument void AddInput(const char* name, const DimsVariant& dims, std::initializer_list values, ^ /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/base_tester.h:94:8: note: candidate function template not viable: no known conversion from 'onnxruntime::test::ModelTester' to 'onnxruntime::test::BaseTester' for object argument void AddInput(const char* name, const DimsVariant& dims, const std::vector& values, ^ /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/base_tester.h:210:8: note: candidate function template not viable: requires 2 arguments, but 3 were provided void AddInput(const char* name, const T& val) { ^ /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/base_tester.h:222:8: note: candidate function template not viable: requires 2 arguments, but 3 were provided void AddInput(const char* name, T&& val) { ^ /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/base_tester.h:296:8: note: candidate function template not viable: requires 2 arguments, but 3 were provided void AddInput(const char* name, const std::map& val) { ^ /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/base_tester.h:73:8: note: candidate function template not viable: requires at least 4 arguments, but 3 were provided void AddInput(const char* name, std::initializer_list dims, const T* p_values, ^ /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/base_tester.h:100:8: note: candidate function template not viable: requires at least 4 arguments, but 3 were provided void AddInput(const char* name, const DimsVariant& dims, const T* p_values, ^ /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/coreml/dynamic_input_test.cc:84:10: error: no matching member function for call to 'AddOutput' tester.AddOutput("Y", {0, 4}, {}); ~~~~~~~^~~~~~~~~~~~~~~~ /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/base_tester.h:320:8: note: candidate function template not viable: no known conversion from 'onnxruntime::test::ModelTester' to 'onnxruntime::test::BaseTester' for object argument void AddOutput(const char* name, std::initializer_list dims, std::initializer_list expected_values, ^ /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/base_tester.h:328:8: note: candidate function template not viable: no known conversion from 'onnxruntime::test::ModelTester' to 'onnxruntime::test::BaseTester' for object argument void AddOutput(const char* name, std::initializer_list dims, const std::vector& expected_values, ^ /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/base_tester.h:344:8: note: candidate function template not viable: no known conversion from 'onnxruntime::test::ModelTester' to 'onnxruntime::test::BaseTester' for object argument void AddOutput(const char* name, const DimsVariant& dims, std::initializer_list expected_values, ^ /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/base_tester.h:352:8: note: candidate function template not viable: no known conversion from 'onnxruntime::test::ModelTester' to 'onnxruntime::test::BaseTester' for object argument void AddOutput(const char* name, const DimsVariant& dims, const std::vector& expected_values, ^ /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/base_tester.h:479:8: note: candidate function template not viable: requires 2 arguments, but 3 were provided void AddOutput(const char* name, const T& val) { ^ /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/base_tester.h:491:8: note: candidate function template not viable: requires 2 arguments, but 3 were provided void AddOutput(const char* name, T&& val) { ^ /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/base_tester.h:504:8: note: candidate function template not viable: requires 2 arguments, but 3 were provided void AddOutput(const char* name, const std::vector>& val) { ^ /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/base_tester.h:336:8: note: candidate function template not viable: requires at least 4 arguments, but 3 were provided void AddOutput(const char* name, std::initializer_list dims, const T* p_values, const size_t size, ^ /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/base_tester.h:359:8: note: candidate function template not viable: requires at least 4 arguments, but 3 were provided void AddOutput(const char* name, const DimsVariant& dims, const T* p_values, const size_t size, ^ /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/coreml/dynamic_input_test.cc:90:8: error: no matching member function for call to 'Config' .Config(ModelTester::ExpectResult::kExpectFailure, ~^~~~~~ /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/base_tester.h:545:15: note: candidate function not viable: no known conversion from 'onnxruntime::test::ModelTester' to 'onnxruntime::test::BaseTester' for object argument BaseTester& Config(ExpectResult expect_result, const std::string& expected_failure_string); ^ /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/base_tester.h:544:15: note: candidate function not viable: requires single argument 'sess_options', but 2 arguments were provided BaseTester& Config(const SessionOptions& sess_options); ^ /Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/base_tester.h:547:15: note: candidate function not viable: requires single argument 'run_options', but 2 arguments were provided BaseTester& Config(const RunOptions* run_options); ^ 12 errors generated. make[2]: *** [CMakeFiles/onnxruntime_test_all.dir/Users/louislac/Developer/onnxruntime/onnxruntime/test/providers/coreml/dynamic_input_test.cc.o] Error 1 make[2]: *** Waiting for unfinished jobs.... make[1]: *** [CMakeFiles/onnxruntime_test_all.dir/all] Error 2 make: *** [all] Error 2 Traceback (most recent call last): File "/Users/louislac/Developer/onnxruntime/tools/ci_build/build.py", line 2627, in sys.exit(main()) File "/Users/louislac/Developer/onnxruntime/tools/ci_build/build.py", line 2520, in main build_targets(args, cmake_path, build_dir, configs, num_parallel_jobs, args.target) File "/Users/louislac/Developer/onnxruntime/tools/ci_build/build.py", line 1438, in build_targets run_subprocess(cmd_args, env=env) File "/Users/louislac/Developer/onnxruntime/tools/ci_build/build.py", line 787, in run_subprocess return run(*args, cwd=cwd, capture_stdout=capture_stdout, shell=shell, env=my_env) File "/Users/louislac/Developer/onnxruntime/tools/python/util/run.py", line 49, in run completed_process = subprocess.run( File "/Users/louislac/miniconda3/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['/opt/homebrew/bin/cmake', '--build', '/Users/louislac/Developer/onnxruntime/build/MacOS/RelWithDebInfo', '--config', 'RelWithDebInfo', '--', '-j6']' returned non-zero exit status 2. ```

Update

I got a succeeding build with (removed --minimal_build extended):

./build.sh --config RelWithDebInfo \ 
  --build_shared_lib \
  --parallel 6 \
  --compile_no_warning_as_error \
  --skip_submodule_sync \
  --cmake_extra_defines \
  CMAKE_OSX_ARCHITECTURES=arm64 \
  --use_coreml \
  --build_wheel

The tests are not passing however.

laclouis5 commented 1 year ago

@edgchen1 I confirm the installation was successful and I was able to test ORT 16.0 with a SAM ViT-B. The encoder model fails to be created with the error:

2023-08-04 20:57:45.696448 [W:onnxruntime:, coreml_execution_provider.cc:81 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 114 number of nodes in the graph: 607 number of nodes supported by CoreML: 387
Error in loading model: [ONNXRuntimeError] : 1 : FAIL : Error Creating MLModel Error in declaring network.

Here is a link to the ONNX model that caused this error.

The model works fine with a CPUExecutionProvider.

edgchen1 commented 1 year ago

Here is a link to the ONNX model that caused this error.

Looking into it. Is the model private or can I discuss details about it here?

laclouis5 commented 1 year ago

Here is a link to the ONNX model that caused this error.

Looking into it. Is the model private or can I discuss details about it here?

The model is public, it’s one from the AnyLabeling software. They simply repackaged the public SAM model as far as I know.

edgchen1 commented 1 year ago

@edgchen1 I confirm the installation was successful and I was able to test ORT 16.0 with a SAM ViT-B. The encoder model fails to be created with the error:

2023-08-04 20:57:45.696448 [W:onnxruntime:, coreml_execution_provider.cc:81 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 114 number of nodes in the graph: 607 number of nodes supported by CoreML: 387
Error in loading model: [ONNXRuntimeError] : 1 : FAIL : Error Creating MLModel Error in declaring network.

The cause of this error is that CoreML doesn't like a ReshapeStaticLayer with rank greater than 5 (in the input shape or the new shape).

Some additional log output was visible when running in lldb.

[V:onnxruntime:, base_op_builder.cc:52 AddToModelBuilder] Operator name: [/image_encoder/blocks.0/Reshape_2] type: [Reshape] was added
[V:onnxruntime:, base_op_builder.cc:52 AddToModelBuilder] Operator name: [/image_encoder/blocks.0/Transpose_1] type: [Transpose] was added
[V:onnxruntime:, base_op_builder.cc:52 AddToModelBuilder] Operator name: [/image_encoder/blocks.0/Reshape_3] type: [Reshape] was added
[espresso] [Espresso::handle_ex_plan] exception=Espresso exception: "Invalid blob shape": ANE does not support blob with rank: 6 status=-7
[coreml] Error in adding network -7.
[coreml] MLModelAsset: load failed with error Error Domain=com.apple.CoreML Code=0 "Error in declaring network." UserInfo={NSLocalizedDescription=Error in declaring network.}
[coreml] MLModelAsset: modelWithError: load failed with error Error Domain=com.apple.CoreML Code=0 "Error in declaring network." UserInfo={NSLocalizedDescription=Error in declaring network.}

I did manage to get around that error by adding some checks to the CoreML EP to ensure it doesn't take nodes where those shapes have rank less than or equal to 5. But there was another runtime error later on which seems trickier to debug (OS kernel panic when initializing a CoreML MLModel). That failure wasn't reproducible when creating a similar MLModel in isolation, so I think it's related to compiling multiple CoreML partitions (which each have an MLModel) at one time.

Errors aside, this particular model is not well supported by the CoreML EP at the moment. We can see that from the number of CoreML partitions that were created, which means that many intermediate nodes are not supported by the EP. With more partitions running on CoreML (possibly on ANE), we incur more overhead copying data around, which usually has a significant performance penalty. That penalty may large enough that running everything with the CPU EP is more performant.

number of partitions supported by CoreML: 114 number of nodes in the graph: 607 number of nodes supported by CoreML: 387

We can look into adding support for ops which are not supported or fully supported yet by the CoreML EP. In this model, the ops which are unsupported for some reason are:

[Concat]
[Einsum]
[Erf]
[Gather]
[LayerNormalization]
[MatMul]
[Pad]
[Reshape]
[Shape]
[Slice]
[Softmax]
[Split]
[Sub]
[Transpose]
[Unsqueeze]

More details here: op_support_log_unsupported_only.txt

But for situations where we run into a limitation of CoreML, like the rank 5 limit for reshape, the options may be limited.

Do you have a production use case for this model?

divideconcept commented 1 year ago

Except for Einsum and Erf I've never encountered before, all the other ops you listed here seems pretty common to me, I see them in a lot of models. So it might be worth adding CoreML support for theses ?

laclouis5 commented 1 year ago

Do you have a production use case for this model?

SAM models and its variants (FastSAM) are now widely used for semi-automated segmentation tasks. For instance the AnyLabeling tool uses such models to assist the semantic segmentation annotation task and this tool is quite popular (not notion of "production use case" is quite subjective). SAM models are also used in concrete applications such as medical image segmentation with SAMed.

Apart from transformers models, dynamic shapes are widely used in standard CNNs in order to allow different speed-accuracy trade-offs. For instance, one would use a CNN (let say for object detection) with input shapes of 256x256 or 512x512 depending on the need of real-time inference.

I also often use dynamic shapes to allow multiple batch sizes. This allows ones to adapt to different loads dynamically, for instances on servers.

ONNX, TensortRT, CoreML and others support dynamic shape because there is a demand for such capability I guess. Performance is not the priority, at least for me, thus I think that basic support for dynamic shape with CoreMLExecutionProvider is enough, for instance using unbounded ranges only. This avoids crashes at runtime and improves the interoperability.

edgchen1 commented 1 year ago

I added some checks to limit the shapes to at most rank 5 in the CoreML EP. That limit seems to be inherent to CoreML at the moment. ORT should no longer fail to load the model because of CoreML compilation errors due to that, but it limits the nodes that are assigned to the CoreML EP.

https://github.com/microsoft/onnxruntime/pull/17086

This SAM model has some shapes with rank greater than 5, so those nodes won't be supported by the CoreML EP. Maybe the model can be updated to work around this limitation.

Performance is not the priority, at least for me, thus I think that basic support for dynamic shape with CoreMLExecutionProvider is enough, for instance using unbounded ranges only.

Basic dynamic shape support is enabled in the CoreML EP now. Note that the CPU EP is also an option if performance is not a priority.

Except for Einsum and Erf I've never encountered before, all the other ops you listed here seems pretty common to me, I see them in a lot of models. So it might be worth adding CoreML support for theses ?

We'll look into adding more op support. PRs are welcome too.

pbanavara commented 7 months ago

I am trying to run a YOLO ONNX pose detection model on iOS 17. Works fine in CPU inference. However it is super slow. 10x slower than the CoreML converted model. So I figured this is running only on CPU and added the following

let ortCoreMlSessionOptions = ORTCoreMLExecutionProviderOptions() try ortSessionOptions.appendCoreMLExecutionProvider(with: ortCoreMlSessionOptions) However I get this error when the model runs.

[E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running CoreML_12572311578788037559_16 node. Name:'CoreMLExecutionProvider_CoreML_12572311578788037559_16_16' Status Message: coreml_execution_provider.cc:168 operator() Input (_ppp9_nmsbox) has a dynamic shape ({-1,3}) but the runtime shape ({0,3}) has zero elements. This is not supported by the CoreML EP.

I am super new to ONNX and any help in resolving this will be greatly appreciated.