k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker recognition, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust
https://k2-fsa.github.io/sherpa/onnx/index.html
Apache License 2.0
3.19k stars 372 forks source link

Error when running tts model #1111

Open thewh1teagle opened 2 months ago

thewh1teagle commented 2 months ago

Update

It was issue with the tokens file, it was invalid. maybe we can improve the error message?


I tried to run tts model on macOS m1 with examples/tts.rs and got this error

➜  sherpa-rs git:(main) ✗ cargo run --example tts --features="tts" -- --text 'שלום, מה שלומך היום?' --output audio.wav --tokens 'tokens.txt' --model 'model_sherpa.onnx' --provider cpu
   Compiling sherpa-rs v0.1.5-beta.5 (/Users/user/Documents/sherpa-rs)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.37s
     Running `target/debug/examples/tts --text 'שלום, מה שלומך היום?' --output audio.wav --tokens tokens.txt --model model_sherpa.onnx --provider cpu`
/Users/user/Documents/sherpa-rs/target/debug/build/sherpa-rs-sys-2249181d8eb0cc1d/out/sherpa-onnx/sherpa-onnx/csrc/offline-tts-character-frontend.cc:ReadTokens:68 Duplicated token '. Line ' 176. Existing ID: 174
➜  sherpa-rs git:(main) ✗ cd sys/sherpa-onnx                                                                                                    
➜  sherpa-onnx git:(master) git rev-parse HEAD

c0eaf86dbd4b7c842852215d5418e065a64e6190

In addition when using coreml provider I got many other warnings:

log ``` cargo run --example tts --features="tts" -- --text 'שלום, מה שלומך היום?' --output audio.wav --tokens 'tokens.txt' --model 'model_sherpa.onnx' --provider coreml Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.05s Running `target/debug/examples/tts --text 'שלום, מה שלומך היום?' --output audio.wav --tokens tokens.txt --model model_sherpa.onnx --provider coreml` 2024-07-11 20:25:17.958 tts[6889:483163] 2024-07-11 20:25:17.957684 [W:onnxruntime:, helper.cc:93 IsInputSupported] CoreML does not support shapes with dimension values of 0. Input:/model/text_encoder/encoder/layers.0/attention/ConstantOfShape_output_0, shape: {0} 2024-07-11 20:25:17.958 tts[6889:483163] 2024-07-11 20:25:17.958148 [W:onnxruntime:, helper.cc:93 IsInputSupported] CoreML does not support shapes with dimension values of 0. Input:/model/text_encoder/encoder/layers.0/attention/ConstantOfShape_2_output_0, shape: {0} 2024-07-11 20:25:17.958 tts[6889:483163] 2024-07-11 20:25:17.958171 [W:onnxruntime:, helper.cc:93 IsInputSupported] CoreML does not support shapes with dimension values of 0. Input:/model/text_encoder/encoder/layers.0/attention/ConstantOfShape_output_0, shape: {0} 2024-07-11 20:25:17.958 tts[6889:483163] 2024-07-11 20:25:17.958185 [W:onnxruntime:, helper.cc:93 IsInputSupported] CoreML does not support shapes with dimension values of 0. Input:/model/text_encoder/encoder/layers.0/attention/ConstantOfShape_2_output_0, shape: {0} 2024-07-11 20:25:17.958 tts[6889:483163] 2024-07-11 20:25:17.958206 [W:onnxruntime:, helper.cc:93 IsInputSupported] CoreML does not support shapes with dimension values of 0. Input:/model/text_encoder/encoder/layers.0/attention/ConstantOfShape_output_0, shape: {0} 2024-07-11 20:25:17.958 tts[6889:483163] 2024-07-11 20:25:17.958222 [W:onnxruntime:, helper.cc:93 IsInputSupported] CoreML does not support shapes with dimension values of 0. Input:/model/text_encoder/encoder/layers.0/attention/ConstantOfShape_2_output_0, shape: {0} 2024-07-11 20:25:17.958 tts[6889:483163] 2024-07-11 20:25:17.958237 [W:onnxruntime:, helper.cc:93 IsInputSupported] CoreML does not support shapes with dimension values of 0. Input:/model/text_encoder/encoder/layers.0/attention/ConstantOfShape_output_0, shape: {0} 2024-07-11 20:25:17.958 tts[6889:483163] 2024-07-11 20:25:17.958305 [W:onnxruntime:, helper.cc:93 IsInputSupported] CoreML does not support shapes with dimension values of 0. Input:/model/text_encoder/encoder/layers.0/attention/ConstantOfShape_2_output_0, shape: {0} 2024-07-11 20:25:17.958 tts[6889:483163] 2024-07-11 20:25:17.958348 [W:onnxruntime:, helper.cc:93 IsInputSupported] CoreML does not support shapes with dimension values of 0. Input:/model/text_encoder/encoder/layers.0/attention/ConstantOfShape_output_0, shape: {0} 2024-07-11 20:25:17.958 tts[6889:483163] 2024-07-11 20:25:17.958371 [W:onnxruntime:, helper.cc:93 IsInputSupported] CoreML does not support shapes with dimension values of 0. Input:/model/text_encoder/encoder/layers.0/attention/ConstantOfShape_2_output_0, shape: {0} 2024-07-11 20:25:17.958 tts[6889:483163] 2024-07-11 20:25:17.958389 [W:onnxruntime:, helper.cc:93 IsInputSupported] CoreML does not support shapes with dimension values of 0. Input:/model/text_encoder/encoder/layers.0/attention/ConstantOfShape_output_0, shape: {0} 2024-07-11 20:25:17.958 tts[6889:483163] 2024-07-11 20:25:17.958403 [W:onnxruntime:, helper.cc:93 IsInputSupported] CoreML does not support shapes with dimension values of 0. Input:/model/text_encoder/encoder/layers.0/attention/ConstantOfShape_2_output_0, shape: {0} 2024-07-11 20:25:17.958 tts[6889:483163] 2024-07-11 20:25:17.958424 [W:onnxruntime:, helper.cc:93 IsInputSupported] CoreML does not support shapes with dimension values of 0. Input:/model/text_encoder/encoder/layers.0/attention/ConstantOfShape_output_0, shape: {0} 2024-07-11 20:25:17.958 tts[6889:483163] 2024-07-11 20:25:17.958440 [W:onnxruntime:, helper.cc:93 IsInputSupported] CoreML does not support shapes with dimension values of 0. Input:/model/text_encoder/encoder/layers.0/attention/ConstantOfShape_2_output_0, shape: {0} 2024-07-11 20:25:17.958 tts[6889:483163] 2024-07-11 20:25:17.958455 [W:onnxruntime:, helper.cc:93 IsInputSupported] CoreML does not support shapes with dimension values of 0. Input:/model/text_encoder/encoder/layers.0/attention/ConstantOfShape_output_0, shape: {0} 2024-07-11 20:25:17.958 tts[6889:483163] 2024-07-11 20:25:17.958468 [W:onnxruntime:, helper.cc:93 IsInputSupported] CoreML does not support shapes with dimension values of 0. Input:/model/text_encoder/encoder/layers.0/attention/ConstantOfShape_2_output_0, shape: {0} 2024-07-11 20:25:17.958 tts[6889:483163] 2024-07-11 20:25:17.958491 [W:onnxruntime:, helper.cc:93 IsInputSupported] CoreML does not support shapes with dimension values of 0. Input:/model/text_encoder/encoder/layers.0/attention/ConstantOfShape_output_0, shape: {0} 2024-07-11 20:25:17.958 tts[6889:483163] 2024-07-11 20:25:17.958506 [W:onnxruntime:, helper.cc:93 IsInputSupported] CoreML does not support shapes with dimension values of 0. Input:/model/text_encoder/encoder/layers.0/attention/ConstantOfShape_2_output_0, shape: {0} 2024-07-11 20:25:17.958 tts[6889:483163] 2024-07-11 20:25:17.958521 [W:onnxruntime:, helper.cc:93 IsInputSupported] CoreML does not support shapes with dimension values of 0. Input:/model/text_encoder/encoder/layers.0/attention/ConstantOfShape_output_0, shape: {0} 2024-07-11 20:25:17.958 tts[6889:483163] 2024-07-11 20:25:17.958533 [W:onnxruntime:, helper.cc:93 IsInputSupported] CoreML does not support shapes with dimension values of 0. Input:/model/text_encoder/encoder/layers.0/attention/ConstantOfShape_2_output_0, shape: {0} 2024-07-11 20:25:17.958 tts[6889:483163] 2024-07-11 20:25:17.958552 [W:onnxruntime:, helper.cc:93 IsInputSupported] CoreML does not support shapes with dimension values of 0. Input:/model/text_encoder/encoder/layers.0/attention/ConstantOfShape_output_0, shape: {0} 2024-07-11 20:25:17.958 tts[6889:483163] 2024-07-11 20:25:17.958567 [W:onnxruntime:, helper.cc:93 IsInputSupported] CoreML does not support shapes with dimension values of 0. Input:/model/text_encoder/encoder/layers.0/attention/ConstantOfShape_2_output_0, shape: {0} 2024-07-11 20:25:17.958 tts[6889:483163] 2024-07-11 20:25:17.958582 [W:onnxruntime:, helper.cc:93 IsInputSupported] CoreML does not support shapes with dimension values of 0. Input:/model/text_encoder/encoder/layers.0/attention/ConstantOfShape_output_0, shape: {0} 2024-07-11 20:25:17.958 tts[6889:483163] 2024-07-11 20:25:17.958593 [W:onnxruntime:, helper.cc:93 IsInputSupported] CoreML does not support shapes with dimension values of 0. Input:/model/text_encoder/encoder/layers.0/attention/ConstantOfShape_2_output_0, shape: {0} 2024-07-11 20:25:17.959 tts[6889:483163] 2024-07-11 20:25:17.959690 [W:onnxruntime:, coreml_execution_provider.cc:104 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 43 number of nodes in the graph: 2885 number of nodes supported by CoreML: 80 2024-07-11 20:25:18.470 tts[6889:483163] 2024-07-11 20:25:18.470639 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf. 2024-07-11 20:25:18.470 tts[6889:483163] 2024-07-11 20:25:18.470693 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments. /Users/user/Documents/sherpa-rs/target/debug/build/sherpa-rs-sys-2249181d8eb0cc1d/out/sherpa-onnx/sherpa-onnx/csrc/offline-tts-character-frontend.cc:ReadTokens:68 Duplicated token '. Line ' 176. Existing ID: 174 ```

Other model worked:

cargo run --example tts --features="tts" -- --text 'liliana, the most beautiful and lovely assistant of our team!' --output audio.wav --tokens 'tokens.txt' --model 'vits-ljs.onnx' --lexicon lexicon.txt

Note that the failed model is for Hebrew. it wored for me on Windows few days ago.

csukuangfj commented 2 months ago

/Users/user/Documents/sherpa-rs/target/debug/build/sherpa-rs-sys-2249181d8eb0cc1d/out/sherpa-onnx/sherpa-onnx/csrc/offline-tts-character-frontend.cc:ReadTokens:68 Duplicated token '. Line ' 176. Existing ID: 174

I have looked at the tokens.txt

Screenshot 2024-07-12 at 14 49 05

There are only 32 lines but the log says it has issues at line 176.

Could you please re-check whether you have used the correct tokens.txt? Would be great if you can post the tokens.txt you are using.

csukuangfj commented 2 months ago

In addition when using coreml provider I got many other warnings:

I think the warnings are expected. Not all operators from onnxruntime are supported by coreml.