CoreML delegate crashes on model load on iOS 15.5

GregoryComer commented 1 day ago

🐛 Describe the bug

The CoreML delegate appears to not function on iOS 15.5. We encountered an issue with model load on a Meta-internal model, but I am also able to reproduce the issue with a simple model. It is able to run successfully on iOS 17 but fails to load on iOS 15.5. It appears to work on iOS 16 and later, but I have not tested each version.

The model fails to execute with the following trace when loading the model method:

2024-11-20 02:25:06.189794-0800 Benchmark[75906:957741] *** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '+[MLModel compileModelAtURL:completionHandler:]: unrecognized selector sent to class 0x1daa8e928'
*** First throw call stack:
(
    0   CoreFoundation                      0x00000001803f3d70 __exceptionPreprocess + 236
    1   libobjc.A.dylib                     0x000000018019814c objc_exception_throw + 56
    2   CoreFoundation                      0x00000001804031ec __CFExceptionProem + 0
    3   CoreFoundation                      0x00000001803f7fe0 ___forwarding___ + 1440
    4   CoreFoundation                      0x00000001803fa07c _CF_forwarding_prep_0 + 92
    5   Tests                               0x000000010484305c +[ETCoreMLModelCompiler compileModelAtURL:maxWaitTimeInSeconds:error:] + 224
    6   Tests                               0x0000000104844780 -[ETCoreMLModelManager compiledModelURLWithIdentifier:inMemoryFS:assetManager:error:] + 2052
    7   Tests                               0x0000000104844c00 -[ETCoreMLModelManager modelExecutorWithMetadata:inMemoryFS:configuration:error:] + 352
    8   Tests                               0x0000000104845c80 __70-[ETCoreMLModelManager _modelExecutorWithAOTData:configuration:error:]_block_invoke + 36
    9   libdispatch.dylib                   0x000000018010ea98 _dispatch_client_callout + 16
    10  libdispatch.dylib                   0x000000018011d1d4 _dispatch_lane_barrier_sync_invoke_and_complete + 92
    11  Tests                               0x00000001048457a0 -[ETCoreMLModelManager _modelExecutorWithAOTData:configuration:error:] + 2572
    12  Tests                               0x0000000104846060 -[ETCoreMLModelManager loadModelFromAOTData:configuration:error:] + 28
    13  Tests                               0x000000010484c04c -[ETCoreMLModelManagerDelegate loadModelFromAOTData:configuration:error:] + 108
    14  Tests                               0x000000010484cafc _ZNK16executorchcoreml19BackendDelegateImpl4initENS_6BufferERKNSt3__113unordered_mapINS2_12basic_stringIcNS2_11char_traitsIcEENS2_9allocatorIcEEEES1_NS2_4hashIS9_EENS2_8equal_toIS9_EENS7_INS2_4pairIKS9_S1_EEEEEE + 812
    15  Tests                               0x000000010484d520 _ZNK10executorch8backends6coreml21CoreMLBackendDelegate4initERNS_7runtime18BackendInitContextEPNS3_14FreeableBufferENS3_8ArrayRefINS3_11CompileSpecEEE + 160
    16  Tests                               0x000000010482dbb4 _ZN10executorch7runtime15BackendDelegate4InitERKN21executorch_flatbuffer15BackendDelegateEPKNS0_7ProgramERNS0_18BackendInitContextEPS1_ + 520
    17  Tests                               0x000000010482d450 _ZN10executorch7runtime6Method4initEPN21executorch_flatbuffer13ExecutionPlanE + 292
    18  Tests                               0x000000010482d280 _ZN10executorch7runtime6Method4loadEPN21executorch_flatbuffer13ExecutionPlanEPKNS0_7ProgramEPNS0_13MemoryManagerEPNS0_11EventTracerE + 160
    19  Tests                               0x0000000104833198 _ZN10executorch9extension6Module11load_methodERKNSt3__112basic_stringIcNS2_11char_traitsIcEENS2_9allocatorIcEEEEPNS_7runtime11EventTracerE + 872
    20  Tests                               0x0000000104833430 _ZN10executorch9extension6Module11method_metaERKNSt3__112basic_stringIcNS2_11char_traitsIcEENS2_9allocatorIcEEEE + 36
    21  Tests                               0x0000000105180078 __41+[GenericTests dynamicTestsForResources:]_block_invoke.29 + 348

PTE repro (though multiple / all(?) models appear to have this issue):

from executorch.backends.apple.coreml.partition import (
    CoreMLPartitioner,
)
import torch
import torch
import torch.nn as nn
import torch.nn.functional as F
from executorch import exir

class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits
net = NeuralNetwork()

example_inputs = torch.randn(1, 28, 28)
out = net(example_inputs)
print(out)

# Make sure to use the example_inputs from the previous step
exported_program = torch.export.export(net, (example_inputs,))
edge_program = exir.program.to_edge(
    exported_program,
    compile_config=exir.EdgeCompileConfig(_check_ir_validity=False)
)
executorch_program = edge_program.to_executorch()

partitioned_edge_program = exir.to_edge(
    exported_program,
).to_backend(CoreMLPartitioner())
executorch_program = partitioned_edge_program.to_executorch()

with open("net_coreml.pte", "wb") as f:
    executorch_program.write_to_file(f)

Versions

Collecting environment information... PyTorch version: 2.6.0.dev20241112 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A

OS: macOS 14.6.1 (arm64) GCC version: Could not collect Clang version: 15.0.0 (clang-1500.3.9.4) CMake version: version 3.29.0 Libc version: N/A

Python version: 3.10.13 (main, Sep 11 2023, 08:16:02) [Clang 14.0.6 ] (64-bit runtime) Python platform: macOS-14.6.1-arm64-arm-64bit Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Apple M1 Pro

Versions of relevant libraries: [pip3] executorch==0.5.0a0+1de96f8 [pip3] executorchcoreml==0.0.1 [pip3] flake8==6.1.0 [pip3] flake8-breakpoint==1.1.0 [pip3] flake8-bugbear==23.9.16 [pip3] flake8-comprehensions==3.14.0 [pip3] flake8-executable==2.1.3 [pip3] flake8-logging-format==0.9.0 [pip3] flake8-plugin-utils==1.3.3 [pip3] flake8-pyi==23.5.0 [pip3] flake8-simplify==0.19.3 [pip3] mypy==1.11.2 [pip3] mypy-extensions==1.0.0 [pip3] numpy==1.26.4 [pip3] optree==0.13.0 [pip3] pytorch-sphinx-theme==0.0.24 [pip3] torch==2.6.0.dev20241112 [pip3] torchao==0.1 [pip3] torchaudio==2.5.0.dev20241112 [pip3] torchsr==1.0.4 [pip3] torchvision==0.20.0.dev20241112 [conda] executorch 0.1.0 pypi_0 pypi [conda] executorchcoreml 0.0.1 pypi_0 pypi [conda] numpy 1.26.4 pypi_0 pypi [conda] optree 0.13.0 pypi_0 pypi [conda] pytorch-sphinx-theme 0.0.24 dev_0 [conda] torch 2.6.0.dev20241112 pypi_0 pypi [conda] torchao 0.1 pypi_0 pypi [conda] torchaudio 2.5.0.dev20241112 pypi_0 pypi [conda] torchfix 0.5.0 pypi_0 pypi [conda] torchsr 1.0.4 pypi_0 pypi [conda] torchvision 0.20.0.dev20241112 pypi_0 pypi

kimishpatel commented 1 day ago

@cymbalrush can you take a look

metascroy commented 1 day ago

Briefly looking here, it looks like CoreML delegate uses compileModelAtURL:completionHandler inside the runtime here: https://github.com/pytorch/executorch/blob/dcacde01d355b5f5082301edcdc9774dd3392f36/backends/apple/coreml/runtime/delegate/ETCoreMLModelCompiler.mm#L31

But this method was introduced for iOS16: https://developer.apple.com/documentation/coreml/mlmodel/compilemodel(at:)-3nea?language=objc

pytorch / executorch

CoreML delegate crashes on model load on iOS 15.5 #6984

🐛 Describe the bug

Versions