k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, and speaker recognition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter
https://k2-fsa.github.io/sherpa/onnx/index.html
Apache License 2.0
2.95k stars 334 forks source link

[Help wanted] Support CoreML for iOS #152

Open csukuangfj opened 1 year ago

csukuangfj commented 1 year ago

151 adds CoreML support for macOS. We also need to support iOS.

The pre-built libs can be downloaded from https://onnxruntimepackages.z14.web.core.windows.net/pod-archive-onnxruntime-c-1.14.0.zip

After unzipping, you will find the following files:

.
├── Headers
│   ├── coreml_provider_factory.h
│   ├── cpu_provider_factory.h
│   ├── onnxruntime_c_api.h
│   ├── onnxruntime_cxx_api.h
│   └── onnxruntime_cxx_inline.h
├── LICENSE
├── a.txt
└── onnxruntime.xcframework
    ├── Info.plist
    ├── ios-arm64
    │   └── onnxruntime.framework
    │       ├── Headers
    │       │   ├── coreml_provider_factory.h
    │       │   ├── cpu_provider_factory.h
    │       │   ├── onnxruntime_c_api.h
    │       │   ├── onnxruntime_cxx_api.h
    │       │   └── onnxruntime_cxx_inline.h
    │       ├── Info.plist
    │       └── onnxruntime
    └── ios-arm64_x86_64-simulator
        └── onnxruntime.framework
            ├── Headers
            │   ├── coreml_provider_factory.h
            │   ├── cpu_provider_factory.h
            │   ├── onnxruntime_c_api.h
            │   ├── onnxruntime_cxx_api.h
            │   └── onnxruntime_cxx_inline.h
            ├── Info.plist
            └── onnxruntime

8 directories, 22 files

TODOs

jingzhaoou commented 1 year ago

Even though https://github.com/k2-fsa/sherpa-onnx/pull/151 provides support to CoreML, we found that none of the operators can be mapped to the neural engine on my Intel and M2 MacBook. I see the following warnings:

2023-06-23 17:30:17.033 sherpa-onnx[2320:31342757] 2023-06-23 17:30:17.033042 [W:onnxruntime:, [helper.cc:61](https://helper.cc:61/) IsInputSupported] Dynamic shape is not supported for now, for input:x
2023-06-23 17:30:17.325 sherpa-onnx[2320:31342757] 2023-06-23 17:30:17.324992 [W:onnxruntime:, [helper.cc:61](https://helper.cc:61/) IsInputSupported] Dynamic shape is not supported for now, for input:y
2023-06-23 17:30:17.329 sherpa-onnx[2320:31342757] 2023-06-23 17:30:17.329294 [W:onnxruntime:, [helper.cc:61](https://helper.cc:61/) IsInputSupported] Dynamic shape is not supported for now, for input:encoder_out
...
2023-06-26 22:41:24.330 sherpa-onnx[83435:2226254] 2023-06-26 22:41:24.330654 [I:onnxruntime:, coreml_execution_provider.cc:93 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 0 number of nodes in the graph: 3286 number of nodes supported by CoreML: 0

Notice "number of partitions supported by CoreML: 0" and "number of nodes supported by CoreML: 0" at the end. The parameter N is one root cause of the dynamic types. There may be other issues.

I suggest that we investigate further on the ONNX mapping issue before adding more CoreML support. Otherwise, using CoreML may not provide any benefits. Thanks.

wujingcheng7 commented 2 months ago

请问现在 iOS 是不是仍然只能支持 cpu provider?我尝试使用 "coreml" provider 初始化模型,会遭遇崩溃。 @csukuangfj

image image
csukuangfj commented 2 months ago

Could you re-export the model with batch size == 1 so that there are no dynamic shapes in the streaming model, and retry?

wujingcheng7 commented 2 months ago

Could you re-export the model with batch size == 1 so that there are no dynamic shapes in the streaming model, and retry?

你说的这个要怎么做呢?我使用的模型是在这里当前仓库下载的,不是自己导出的模型 @csukuangfj https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models 我选择了 https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-coqui-en-ljspeech-neon.tar.bz2

csukuangfj commented 2 months ago

https://colab.research.google.com/drive/1cI9VzlimS51uAw4uCR-OBeSXRPBc4KoK?usp=sharing

你可以看下这个导出脚本。

如何去掉 dynamic axes, 这个是 torch 导出 onnx 的基本知识,可以自己学习下.

w11wo commented 2 months ago

Hi. I thought I'd jump in because I'm also interested to get the ASR models runnable on CoreML.

I re-exported this model with batch size = 1. I'm assuming that all N's in the ONNX export script correspond to the batch size.

I've made some changes as shown in this fork, where I removed all dynamic_axes arguments, and also made sure all possible N is set to 1. I saw that the encoder batch size already defaulted to 1. I'm not sure if the changes I made to the decoder and joiner initial random values are of the correct shape, though.

Regardless, I've exported the ONNX model with the above changes with the following command:

./zipformer/export-onnx-streaming.py \
  --exp-dir tmp/icefall-asr-librispeech-streaming-zipformer-2023-05-17/exp \
  --causal 1 \
  --chunk-size 16 \
  --left-context-frames 128 \
  --use-transducer True --use-ctc True \
  --tokens tmp/icefall-asr-librispeech-streaming-zipformer-2023-05-17/data/lang_bpe_500/tokens.txt \
  --use-averaged-model 0 \
  --epoch 99 \
  --avg 1

And could even run it like so:

./zipformer/onnx_pretrained-streaming.py \
  --encoder-model-filename tmp/icefall-asr-librispeech-streaming-zipformer-2023-05-17/exp/encoder-epoch-99-avg-1-chunk-16-left-128.onnx \
  --decoder-model-filename tmp/icefall-asr-librispeech-streaming-zipformer-2023-05-17/exp/decoder-epoch-99-avg-1-chunk-16-left-128.onnx \
  --joiner-model-filename tmp/icefall-asr-librispeech-streaming-zipformer-2023-05-17/exp/joiner-epoch-99-avg-1-chunk-16-left-128.onnx \
  --tokens tmp/icefall-asr-librispeech-streaming-zipformer-2023-05-17/data/lang_bpe_500/tokens.txt \
  ./tmp/icefall-asr-librispeech-streaming-zipformer-2023-05-17/test_wavs/1089-134686-0001.wav
# AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS

Then I tested on my macOS, via sherpa-onnx's Python API. I ran the sample script

python ./python-api-examples/speech-recognition-from-microphone-with-endpoint-detection.py \
  --tokens=./sherpa-onnx-zipformer-streaming-librispeech/tokens.txt \
  --encoder=./sherpa-onnx-zipformer-streaming-librispeech/encoder-epoch-99-avg-1-chunk-16-left-128.int8.onnx \
  --decoder=./sherpa-onnx-zipformer-streaming-librispeech/decoder-epoch-99-avg-1-chunk-16-left-128.int8.onnx \
  --joiner=./sherpa-onnx-zipformer-streaming-librispeech/joiner-epoch-99-avg-1-chunk-16-left-128.int8.onnx \
  --provider coreml

which crashed midway. With the following logs:

2024-06-12 22:38:49.520 python[5154:240647] 2024-06-12 22:38:49.520492 [W:onnxruntime:, graph.cc:108 MergeShapeInfo] Error merging shape info for output. '/downsample/Concat_output_0' source:{17,1,256} target:{16,1,256}. Falling back to lenient merge.
2024-06-12 22:38:49.523 python[5154:240647] 2024-06-12 22:38:49.523516 [W:onnxruntime:, graph.cc:108 MergeShapeInfo] Error merging shape info for output. '/downsample_1/Concat_output_0' source:{17,1,384} target:{16,1,384}. Falling back to lenient merge.
2024-06-12 22:38:49.527 python[5154:240647] 2024-06-12 22:38:49.527662 [W:onnxruntime:, graph.cc:108 MergeShapeInfo] Error merging shape info for output. '/downsample_2/Concat_output_0' source:{17,1,512} target:{16,1,512}. Falling back to lenient merge.
2024-06-12 22:38:49.533 python[5154:240647] 2024-06-12 22:38:49.533308 [W:onnxruntime:, graph.cc:108 MergeShapeInfo] Error merging shape info for output. '/downsample_3/Concat_output_0' source:{17,1,384} target:{16,1,384}. Falling back to lenient merge.
2024-06-12 22:38:49.537 python[5154:240647] 2024-06-12 22:38:49.537585 [W:onnxruntime:, graph.cc:108 MergeShapeInfo] Error merging shape info for output. '/downsample_4/Concat_output_0' source:{17,1,256} target:{16,1,256}. Falling back to lenient merge.
2024-06-12 22:38:49.540 python[5154:240647] 2024-06-12 22:38:49.540476 [W:onnxruntime:, graph.cc:108 MergeShapeInfo] Error merging shape info for output. '/downsample_output/Concat_output_0' source:{17,1,512} target:{16,1,512}. Falling back to lenient merge.
2024-06-12 22:38:51.454 python[5154:240647] 2024-06-12 22:38:51.454932 [W:onnxruntime:, graph.cc:108 MergeShapeInfo] Error merging shape info for output. '/downsample/Concat_output_0' source:{17,1,256} target:{16,1,256}. Falling back to lenient merge.
2024-06-12 22:38:51.457 python[5154:240647] 2024-06-12 22:38:51.457789 [W:onnxruntime:, graph.cc:108 MergeShapeInfo] Error merging shape info for output. '/downsample_1/Concat_output_0' source:{17,1,384} target:{16,1,384}. Falling back to lenient merge.
2024-06-12 22:38:51.461 python[5154:240647] 2024-06-12 22:38:51.461920 [W:onnxruntime:, graph.cc:108 MergeShapeInfo] Error merging shape info for output. '/downsample_2/Concat_output_0' source:{17,1,512} target:{16,1,512}. Falling back to lenient merge.
2024-06-12 22:38:51.467 python[5154:240647] 2024-06-12 22:38:51.467399 [W:onnxruntime:, graph.cc:108 MergeShapeInfo] Error merging shape info for output. '/downsample_3/Concat_output_0' source:{17,1,384} target:{16,1,384}. Falling back to lenient merge.
2024-06-12 22:38:51.471 python[5154:240647] 2024-06-12 22:38:51.471600 [W:onnxruntime:, graph.cc:108 MergeShapeInfo] Error merging shape info for output. '/downsample_4/Concat_output_0' source:{17,1,256} target:{16,1,256}. Falling back to lenient merge.
2024-06-12 22:38:51.474 python[5154:240647] 2024-06-12 22:38:51.474356 [W:onnxruntime:, graph.cc:108 MergeShapeInfo] Error merging shape info for output. '/downsample_output/Concat_output_0' source:{17,1,512} target:{16,1,512}. Falling back to lenient merge.
2024-06-12 22:38:52.130 python[5154:240647] 2024-06-12 22:38:52.130313 [W:onnxruntime:, coreml_execution_provider.cc:81 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 571 number of nodes in the graph: 4067 number of nodes supported by CoreML: 2603
Context leak detected, CoreAnalytics returned false
Context leak detected, CoreAnalytics returned false
Context leak detected, CoreAnalytics returned false
Context leak detected, CoreAnalytics returned false
2024-06-12 22:38:54.079 python[5154:240647] 2024-06-12 22:38:54.079740 [E:onnxruntime:, inference_session.cc:1798 operator()] Exception during initialization: /Users/runner/work/1/s/onnxruntime/core/providers/cpu/tensor/reshape_helper.h:45 onnxruntime::ReshapeHelper::ReshapeHelper(const onnxruntime::TensorShape &, onnxruntime::TensorShapeVector &, bool) input_shape_size == size was false. The input tensor cannot be reshaped to the requested shape. Input shape:{17,1,256}, requested shape:{8,2,1,256}

I observed two things:

Unlike the initial report whereby

2023-06-26 22:41:24.330 sherpa-onnx[83435:2226254] 2023-06-26 22:41:24.330654 [I:onnxruntime:, coreml_execution_provider.cc:93 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 0 number of nodes in the graph: 3286 number of nodes supported by CoreML: 0

At least some of the nodes are now compatible with CoreML:

2024-06-12 22:38:52.130 python[5154:240647] 2024-06-12 22:38:52.130313 [W:onnxruntime:, coreml_execution_provider.cc:81 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 571 number of nodes in the graph: 4067 number of nodes supported by CoreML: 2603

You can find my exported model here.

I would love to get this working and contribute.

w11wo commented 2 months ago

I tried re-running the inference on macOS + CoreML, this time with the non-quantized model. Looks like it's a bit different now. I can get it to run unlike last time, but not all nodes are compatible with CoreML.

2024-06-13 12:07:02.915 python[2858:101813] 2024-06-13 12:07:02.914917 [W:onnxruntime:, coreml_execution_provider.cc:81 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 229 number of nodes in the graph: 3261 number of nodes supported by CoreML: 543
Context leak detected, CoreAnalytics returned false
Context leak detected, CoreAnalytics returned false
Context leak detected, CoreAnalytics returned false
Context leak detected, CoreAnalytics returned false
Context leak detected, CoreAnalytics returned false
Context leak detected, CoreAnalytics returned false
Context leak detected, CoreAnalytics returned false
Context leak detected, CoreAnalytics returned false
Context leak detected, CoreAnalytics returned false
Context leak detected, CoreAnalytics returned false
Context leak detected, CoreAnalytics returned false
Context leak detected, CoreAnalytics returned false
Context leak detected, CoreAnalytics returned false
Context leak detected, CoreAnalytics returned false
Context leak detected, CoreAnalytics returned false
Context leak detected, CoreAnalytics returned false
2024-06-13 12:07:05.629 python[2858:101813] 2024-06-13 12:07:05.629930 [W:onnxruntime:, session_state.cc:1162 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-06-13 12:07:05.630 python[2858:101813] 2024-06-13 12:07:05.629992 [W:onnxruntime:, session_state.cc:1164 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2024-06-13 12:07:06.156 python[2858:101813] 2024-06-13 12:07:06.156834 [W:onnxruntime:, session_state.cc:1162 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-06-13 12:07:06.156 python[2858:101813] 2024-06-13 12:07:06.156892 [W:onnxruntime:, session_state.cc:1164 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
Started! Please speak
/Users/mac/Documents/sherpa-onnx/sherpa-onnx/csrc/features.cc:AcceptWaveformImpl:102 Creating a resampler:
   in_sample_rate: 48000
   output_sample_rate: 16000

0:A LOWER
1:A BIGGER WORLDLESS WAITING
csukuangfj commented 2 months ago

It's great to know that you can run it.


https://onnxruntime.ai/docs/execution-providers/CoreML-ExecutionProvider.html#supported-operators

This page lists supported operators.

w11wo commented 2 months ago

@csukuangfj, I think not all ops used are supported by CoreML yet 😅 Or maybe there might still be some dynamic shapes going on; not sure. I need to debug further. The current sherpa-onnx Python API doesn't seem to be verbose enough to tell which ops are not supported.

w11wo commented 2 months ago

Checked with Model Usability Checker.

python -m onnxruntime.tools.check_onnx_model_mobile_usability {encoder,decoder,joiner}-epoch-99-avg-1-chunk-16-left-128.onnx --log_level debug

These are the results:

Encoder

INFO:  Checking encoder-epoch-99-avg-1-chunk-16-left-128.onnx for usability with ORT Mobile.
INFO:  Checking NNAPI
INFO:  202 partitions with a total of 2771/3261 nodes can be handled by the NNAPI EP.
INFO:  Partition sizes: [96, 37, 28, 13, 12, 22, 9, 13, 12, 37, 9, 14, 13, 5, 8, 13, 12, 13, 5, 8, 13, 12, 16, 3, 23, 20, 28, 13, 12, 22, 9, 13, 12, 37, 9, 14, 13, 5, 8, 13, 12, 13, 5, 8, 13, 12, 15, 7, 3, 23, 20, 28, 13, 12, 22, 9, 13, 12, 37, 9, 14, 13, 5, 8, 13, 12, 13, 5, 8, 13, 12, 37, 9, 14, 13, 5, 8, 13, 12, 13, 5, 8, 13, 12, 15, 7, 3, 23, 20, 28, 13, 12, 22, 9, 13, 12, 37, 9, 14, 13, 5, 8, 13, 12, 13, 5, 8, 13, 12, 37, 9, 14, 13, 5, 8, 13, 12, 13, 5, 8, 13, 12, 37, 9, 14, 13, 5, 8, 13, 12, 13, 5, 8, 13, 12, 15, 8, 3, 23, 20, 28, 13, 12, 22, 9, 13, 12, 37, 9, 14, 13, 5, 8, 13, 12, 13, 5, 8, 13, 12, 37, 9, 14, 13, 5, 8, 13, 12, 13, 5, 8, 13, 12, 15, 8, 3, 23, 20, 28, 13, 12, 22, 9, 13, 12, 37, 9, 14, 13, 5, 8, 13, 12, 13, 5, 8, 13, 12, 15, 7, 3, 3]
INFO:  Unsupported nodes due to operator=334
INFO:  Unsupported nodes due to input having a dynamic shape=156
INFO:  Unsupported ops: ai.onnx:Equal,ai.onnx:Expand,ai.onnx:GatherElements,ai.onnx:LessOrEqual,ai.onnx:ReduceSum,ai.onnx:Shape,ai.onnx:Where
DEBUG:  Caveats that have not been checked and may result in a node not being supported:  
     ai.onnx:Conv:Only 2D Conv is supported. Weights and bias should be constant.
     ai.onnx:Gather:Input indices should be constant if not int32 type.
     ai.onnx:Unsqueeze:Input axes should be constant.
INFO:  NNAPI is not recommended with this model as there are 202 partitions covering 85.0% of the nodes in the model. This will most likely result in worse performance than just using the CPU EP.
INFO:  Model should perform well with NNAPI as is: NO
INFO:  Checking CoreML
INFO:  311 partitions with a total of 2589/3261 nodes can be handled by the CoreML EP.
INFO:  Partition sizes: [14, 1, 7, 1, 7, 1, 11, 1, 11, 25, 3, 12, 32, 10, 1, 9, 1, 33, 10, 1, 9, 1, 7, 26, 3, 9, 8, 3, 13, 6, 9, 10, 1, 9, 1, 13, 6, 9, 10, 1, 9, 1, 7, 8, 3, 20, 3, 12, 32, 10, 1, 9, 1, 33, 10, 1, 9, 1, 7, 26, 3, 9, 8, 3, 13, 6, 9, 10, 1, 9, 1, 13, 6, 9, 10, 1, 9, 1, 7, 6, 7, 3, 20, 3, 12, 32, 10, 1, 9, 1, 33, 10, 1, 9, 1, 7, 26, 3, 9, 8, 3, 13, 6, 9, 10, 1, 9, 1, 13, 6, 9, 10, 1, 9, 1, 7, 26, 3, 9, 8, 3, 13, 6, 9, 10, 1, 9, 1, 13, 6, 9, 10, 1, 9, 1, 7, 6, 7, 3, 20, 3, 12, 32, 10, 1, 9, 1, 33, 10, 1, 9, 1, 7, 26, 3, 9, 8, 3, 13, 6, 9, 10, 1, 9, 1, 13, 6, 9, 10, 1, 9, 1, 7, 26, 3, 9, 8, 3, 13, 6, 9, 10, 1, 9, 1, 13, 6, 9, 10, 1, 9, 1, 7, 26, 3, 9, 8, 3, 13, 6, 9, 10, 1, 9, 1, 13, 6, 9, 10, 1, 9, 1, 7, 6, 8, 3, 20, 3, 12, 32, 10, 1, 9, 1, 33, 10, 1, 9, 1, 7, 26, 3, 9, 8, 3, 13, 6, 9, 10, 1, 9, 1, 13, 6, 9, 10, 1, 9, 1, 7, 26, 3, 9, 8, 3, 13, 6, 9, 10, 1, 9, 1, 13, 6, 9, 10, 1, 9, 1, 7, 6, 8, 3, 20, 3, 12, 32, 10, 1, 9, 1, 33, 10, 1, 9, 1, 7, 26, 3, 9, 8, 3, 13, 6, 9, 10, 1, 9, 1, 13, 6, 9, 10, 1, 9, 1, 7, 6, 7, 3, 3]
INFO:  Unsupported nodes due to operator=516
INFO:  Unsupported nodes due to input having a dynamic shape=156
INFO:  Unsupported ops: ai.onnx:Abs,ai.onnx:Equal,ai.onnx:Exp,ai.onnx:Expand,ai.onnx:GatherElements,ai.onnx:LessOrEqual,ai.onnx:Log,ai.onnx:Max,ai.onnx:Neg,ai.onnx:ReduceMean,ai.onnx:ReduceSum,ai.onnx:Softmax,ai.onnx:Unsqueeze,ai.onnx:Where
DEBUG:  Caveats that have not been checked and may result in a node not being supported:  
     ai.onnx:Conv:Only 1D/2D Conv is supported. Weights and bias should be constant.
     ai.onnx:Gather:Input `indices` with scalar value is not supported.
     ai.onnx:MatMul:Input B should be constant.
     ai.onnx:Pow:Only supports cases when both inputs are fp32.
     ai.onnx:Shape:Attribute `start` with non-default value is not supported. Attribute `end` is not supported.
     ai.onnx:Slice:Inputs `starts`, `ends`, `axes`, and `steps` should be constant. Empty slice is not supported.
INFO:  CoreML is not recommended with this model as there are 311 partitions covering 79.4% of the nodes in the model. This will most likely result in worse performance than just using the CPU EP.
INFO:  Model should perform well with CoreML as is: NO
INFO:  ---------------
INFO:  Checking if pre-built ORT Mobile package can be used with encoder-epoch-99-avg-1-chunk-16-left-128.onnx once model is converted from ONNX to ORT format using onnxruntime.tools.convert_onnx_models_to_ort...
DEBUG:  Checking if the data types and operators used in the model are supported in the pre-built ORT package...
INFO:  Unsupported operators:
INFO:    ai.onnx:13:GatherElements
INFO:  
Model is not supported by the pre-built package due to unsupported types and/or operators.
INFO:  Please see https://onnxruntime.ai/docs/install/#install-on-web-and-mobile for information on what is supported in the pre-built package.
INFO:  A custom build of ONNX Runtime will be required to run the model. Please see https://onnxruntime.ai/docs/build/custom.html for details on performing that.
INFO:  ---------------

INFO:  Run `python -m onnxruntime.tools.convert_onnx_models_to_ort ...` to convert the ONNX model to ORT format. By default, the conversion tool will create an ORT format model with saved optimizations which can potentially be applied at runtime (with a .with_runtime_opt.ort file extension) for use with NNAPI or CoreML, and a fully optimized ORT format model (with a .ort file extension) for use with the CPU EP.
INFO:  For optimal performance the <model>.ort model should be used with the CPU EP.

Decoder

INFO:  Checking decoder-epoch-99-avg-1-chunk-16-left-128.onnx for usability with ORT Mobile.
INFO:  Checking NNAPI
INFO:  1 partitions with a total of 10/12 nodes can be handled by the NNAPI EP.
INFO:  Partition sizes: [10]
INFO:  Unsupported nodes due to operator=1
INFO:  Unsupported nodes due to input having a dynamic shape=1
INFO:  Unsupported ops: ai.onnx:GreaterOrEqual
DEBUG:  Caveats that have not been checked and may result in a node not being supported:  
     ai.onnx:Conv:Only 2D Conv is supported. Weights and bias should be constant.
     ai.onnx:Gather:Input indices should be constant if not int32 type.
     ai.onnx:Gemm:If input B is not constant, transB should be 1.
     ai.onnx:Squeeze:Input axes should be constant.
     ai.onnx:Unsqueeze:Input axes should be constant.
INFO:  NNAPI should work well for this model as there is one partition covering 83.3% of the nodes in the model.
INFO:  Model should perform well with NNAPI as is: YES
INFO:  Checking CoreML
INFO:  1 partitions with a total of 9/12 nodes can be handled by the CoreML EP.
INFO:  Partition sizes: [9]
INFO:  Unsupported nodes due to operator=2
INFO:  Unsupported nodes due to input having a dynamic shape=1
INFO:  Unsupported ops: ai.onnx:GreaterOrEqual,ai.onnx:Unsqueeze
DEBUG:  Caveats that have not been checked and may result in a node not being supported:  
     ai.onnx:Conv:Only 1D/2D Conv is supported. Weights and bias should be constant.
     ai.onnx:Gather:Input `indices` with scalar value is not supported.
     ai.onnx:Gemm:Input B should be constant.
INFO:  CoreML may work well for this model, however only 75.0% of nodes will use it. Performance testing is required to validate.
INFO:  Model should perform well with CoreML as is: MAYBE
INFO:  ---------------
INFO:  Checking if pre-built ORT Mobile package can be used with decoder-epoch-99-avg-1-chunk-16-left-128.onnx once model is converted from ONNX to ORT format using onnxruntime.tools.convert_onnx_models_to_ort...
DEBUG:  Checking if the data types and operators used in the model are supported in the pre-built ORT package...
INFO:  Model should work with the pre-built package.
INFO:  ---------------

INFO:  Run `python -m onnxruntime.tools.convert_onnx_models_to_ort ...` to convert the ONNX model to ORT format. By default, the conversion tool will create an ORT format model with saved optimizations which can potentially be applied at runtime (with a .with_runtime_opt.ort file extension) for use with NNAPI or CoreML, and a fully optimized ORT format model (with a .ort file extension) for use with the CPU EP.
INFO:  As NNAPI or CoreML may provide benefits with this model it is recommended to compare the performance of the <model>.with_runtime_opt.ort model using the NNAPI EP on Android, and the CoreML EP on iOS, against the performance of the <model>.ort model using the CPU EP.

Joiner

INFO:  Checking joiner-epoch-99-avg-1-chunk-16-left-128.onnx for usability with ORT Mobile.
INFO:  Checking NNAPI
INFO:  1 partitions with a total of 3/3 nodes can be handled by the NNAPI EP.
INFO:  Partition sizes: [3]
INFO:  Unsupported nodes due to operator=0
DEBUG:  Caveats that have not been checked and may result in a node not being supported:  
     ai.onnx:Gemm:If input B is not constant, transB should be 1.
INFO:  NNAPI should work well for this model as there is one partition covering 100.0% of the nodes in the model.
INFO:  Model should perform well with NNAPI as is: YES
INFO:  Checking CoreML
INFO:  1 partitions with a total of 3/3 nodes can be handled by the CoreML EP.
INFO:  Partition sizes: [3]
INFO:  Unsupported nodes due to operator=0
DEBUG:  Caveats that have not been checked and may result in a node not being supported:  
     ai.onnx:Gemm:Input B should be constant.
INFO:  CoreML should work well for this model as there is one partition covering 100.0% of the nodes in the model.
INFO:  Model should perform well with CoreML as is: YES
INFO:  ---------------
INFO:  Checking if pre-built ORT Mobile package can be used with joiner-epoch-99-avg-1-chunk-16-left-128.onnx once model is converted from ONNX to ORT format using onnxruntime.tools.convert_onnx_models_to_ort...
DEBUG:  Checking if the data types and operators used in the model are supported in the pre-built ORT package...
INFO:  Model should work with the pre-built package.
INFO:  ---------------

INFO:  Run `python -m onnxruntime.tools.convert_onnx_models_to_ort ...` to convert the ONNX model to ORT format. By default, the conversion tool will create an ORT format model with saved optimizations which can potentially be applied at runtime (with a .with_runtime_opt.ort file extension) for use with NNAPI or CoreML, and a fully optimized ORT format model (with a .ort file extension) for use with the CPU EP.
INFO:  As NNAPI or CoreML may provide benefits with this model it is recommended to compare the performance of the <model>.with_runtime_opt.ort model using the NNAPI EP on Android, and the CoreML EP on iOS, against the performance of the <model>.ort model using the CPU EP.

Looks like only the joiner is fully supported by CoreML. For encoder/decoder, some of the ops are still unsupported in CoreML.