SIGSEGV on CoreMLExecutionProvider when using dynamic batch

ankandrew commented 2 months ago

Describe the issue

I'm trying to run YOLOv9 model in ONNX with CoreMLExecutionProvider. The problem appears when using dynamic batch and CoreMLExecutionProvider. When using CPUExecutionProvider, I don't see the segmentation violation.

To reproduce

I'm running on Mac with M1 processor. Requirements are the following:

onnx==1.16.1
onnxruntime==1.18.1
torch==2.3.1

To reproduce the issue, run the following code:

import onnxruntime as ort

# No problem using CPUExecutionProvider
cpu_sess_static = ort.InferenceSession(
    "yolov9-t-e2e-static-batch.onnx",
    providers=["CPUExecutionProvider"],
)
cpu_sess_dynamic = ort.InferenceSession(
    "yolov9-t-e2e-dynamic-batch.onnx",
    providers=["CPUExecutionProvider"],
)

# When using CoreMLExecutionProvider, and only with dynamic batch, there is a signal 11:SIGSEGV
core_ml_sess = ort.InferenceSession(
    "yolov9-t-e2e-static-batch.onnx",
    providers=["CoreMLExecutionProvider"],
)
core_ml_sess2 = ort.InferenceSession(  # Problem is here
    "yolov9-t-e2e-dynamic-batch.onnx",
    providers=["CoreMLExecutionProvider"],
)

When running the above I see:

2024-07-01 21:58:57.405702 [W:onnxruntime:, coreml_execution_provider.cc:104 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 8 number of nodes in the graph: 693 number of nodes supported by CoreML: 673
2024-07-01 21:59:00.013086 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-07-01 21:59:00.013094 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2024-07-01 21:59:00.030764 [W:onnxruntime:, coreml_execution_provider.cc:104 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 16 number of nodes in the graph: 703 number of nodes supported by CoreML: 647

Process finished with exit code 139 (interrupted by signal 11:SIGSEGV)

Both models can be found at:

models.zip

Urgency

No response

Platform

Mac

OS Version

Sonoma 14.5

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.18.1

ONNX Runtime API

Python

Architecture

ARM64

Execution Provider

Default CPU, CoreML

Execution Provider Library Version

No response

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

edgchen1 commented 3 weeks ago

Sorry for the delayed response, looking into it.

microsoft / onnxruntime