k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker recognition, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust
https://k2-fsa.github.io/sherpa/onnx/index.html
Apache License 2.0
3.08k stars 355 forks source link

Audio to phonemes for c# using Onnx? #716

Open GeorgeS2019 opened 5 months ago

GeorgeS2019 commented 5 months ago

Is there already an example for this using ONNY

csukuangfj commented 5 months ago

As answered in another issue, there are no audio to phonemes for sherpa-onnx. We don't use phonemes at all during model training.

If you train a model by yourself that uses phonemes as the modelling unit inside icefall, then you can use the model in sherpa-onnx.

GeorgeS2019 commented 5 months ago

@csukuangfj https://huggingface.co/bookbot/sherpa-onnx-pruned-transducer-stateless7-streaming-id

Instead of being trained to predict sequences of words, this model was trained to predict sequence of phonemes, e.g. ['p', 'ə', 'r', 'b', 'u', 'a', 't', 'a', 'n', 'ɲ', 'a']. Therefore, the model's vocabulary contains the different IPA phonemes found in g2p ID.

This model was converted from the TorchScript version of Pruned Stateless Zipformer RNN-T Streaming ID to ONNX format.

csukuangfj commented 5 months ago

In that case, you can use it directly with sherpa-onnx. Are there any issues you have when running it with sherpa-onnx?