k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust
https://k2-fsa.github.io/sherpa/onnx/index.html
Apache License 2.0
3.72k stars 431 forks source link

paraformer 模型无法使用 coreml provider #902

Open XUJiahua opened 6 months ago

XUJiahua commented 6 months ago

reproduce:

  wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-paraformer-bilingual-zh-en.tar.bz2
  tar xvf sherpa-onnx-streaming-paraformer-bilingual-zh-en.tar.bz2

  ./bin/sherpa-onnx \
    --provider=coreml \
    --tokens=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/tokens.txt \
    --paraformer-encoder=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/encoder.onnx \
    --paraformer-decoder=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/decoder.onnx \
    ./sherpa-onnx-streaming-paraformer-bilingual-zh-en/test_wavs/0.wav

OnlineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0), model_config=OnlineModelConfig(transducer=OnlineTransducerModelConfig(encoder="", decoder="", joiner=""), paraformer=OnlineParaformerModelConfig(encoder="./sherpa-onnx-streaming-paraformer-bilingual-zh-en/encoder.onnx", decoder="./sherpa-onnx-streaming-paraformer-bilingual-zh-en/decoder.onnx"), wenet_ctc=OnlineWenetCtcModelConfig(model="", chunk_size=16, num_left_chunks=4), zipformer2_ctc=OnlineZipformer2CtcModelConfig(model=""), nemo_ctc=OnlineNeMoCtcModelConfig(model=""), tokens="./sherpa-onnx-streaming-paraformer-bilingual-zh-en/tokens.txt", num_threads=1, warm_up=0, debug=False, provider="coreml", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OnlineLMConfig(model="", scale=0.5), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.2, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), ctc_fst_decoder_config=OnlineCtcFstDecoderConfig(graph="", max_active=3000), enable_endpoint=True, max_active_paths=4, hotwords_score=1.5, hotwords_file="", decoding_method="greedy_search", blank_penalty=0, temperature_scale=2)
2024-05-22 16:03:07.735 sherpa-onnx[12523:1926677] 2024-05-22 16:03:07.735596 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running CoreML_10901377769534092317_3 node. Name:'CoreMLExecutionProvider_CoreML_10901377769534092317_3_3' Status Message: Error executing model: Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model (error code: -1).
libc++abi: terminating due to uncaught exception of type Ort::Exception
[1]    12523 abort      ./bin/sherpa-onnx --provider=coreml
csukuangfj commented 6 months ago

其他模型可以么?

有没有试过 int8.onnx ?

你是 macos 么?

XUJiahua commented 6 months ago

我是 MacOS。 sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13.tar.bz2 这个模型可以。 paraformer int8 也是一样的问题。

./bin/sherpa-onnx \
    --provider=coreml \
    --tokens=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/tokens.txt \
    --paraformer-encoder=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/encoder.int8.onnx \
    --paraformer-decoder=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/decoder.int8.onnx \
    ./sherpa-onnx-streaming-paraformer-bilingual-zh-en/test_wavs/0.wav

OnlineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0), model_config=OnlineModelConfig(transducer=OnlineTransducerModelConfig(encoder="", decoder="", joiner=""), paraformer=OnlineParaformerModelConfig(encoder="./sherpa-onnx-streaming-paraformer-bilingual-zh-en/encoder.int8.onnx", decoder="./sherpa-onnx-streaming-paraformer-bilingual-zh-en/decoder.int8.onnx"), wenet_ctc=OnlineWenetCtcModelConfig(model="", chunk_size=16, num_left_chunks=4), zipformer2_ctc=OnlineZipformer2CtcModelConfig(model=""), nemo_ctc=OnlineNeMoCtcModelConfig(model=""), tokens="./sherpa-onnx-streaming-paraformer-bilingual-zh-en/tokens.txt", num_threads=1, warm_up=0, debug=False, provider="coreml", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OnlineLMConfig(model="", scale=0.5), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.2, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), ctc_fst_decoder_config=OnlineCtcFstDecoderConfig(graph="", max_active=3000), enable_endpoint=True, max_active_paths=4, hotwords_score=1.5, hotwords_file="", decoding_method="greedy_search", blank_penalty=0, temperature_scale=2)
2024-05-22 16:12:14.407 sherpa-onnx[12946:1936497] 2024-05-22 16:12:14.407639 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running CoreML_18336092227849758783_3 node. Name:'CoreMLExecutionProvider_CoreML_18336092227849758783_3_3' Status Message: Error executing model: Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model (error code: -1).
libc++abi: terminating due to uncaught exception of type Ort::Exception
[1]    12946 abort      ./bin/sherpa-onnx --provider=coreml
csukuangfj commented 6 months ago

这个问题我解决不了,不好意思。

XUJiahua commented 6 months ago

谢谢及时反馈!

可能就是 onnx runtime 的 coreml provider 有问题。我找到另一个使用 onnx runtime 推理 paraformer 模型的例子,遇到一样的问题。 https://github.com/RapidAI/RapidASR/blob/main/cpp_onnx/readme.md

csukuangfj commented 6 months ago

你自己导出这个模型试试?

XUJiahua commented 6 months ago

我试试,有导出脚本可参考么,我看需要将原 pytorch 模型拆成 encoder, decoder 后分别导出。

csukuangfj commented 6 months ago

https://k2-fsa.github.io/sherpa/onnx/pretrained_models/online-paraformer/paraformer-models.html#csukuangfj-sherpa-onnx-streaming-paraformer-bilingual-zh-en-chinese-english

这里有介绍

zljkevin commented 3 months ago

请问一下解决了吗?