k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust
https://k2-fsa.github.io/sherpa/onnx/index.html
Apache License 2.0
3.71k stars 430 forks source link

Arabic TTS text preprocessing #821

Open mush42 opened 7 months ago

mush42 commented 7 months ago

Hi

Thanks for your awesome work!

I tried the Arabic TTS voice (Kareem), and I noticed that an important text preprocessing step is missing.

Arabic text is usually unvocalized (aka diacritized). For the purposes of intelligibility the text must be vocalized before phonemization. Usually, a lightweight neural network is used for vocalization. This important preprocessing step is missing from sherpa-onnx.

Piper's Arabic voice has been trained with vocalized text. I say this because I prepared and audited the data used for training that voice.

Fortunately, I'm developing a package for Arabic-text vocalization named Libtashkeel.

It is written in Rust, has a C API, is developed to be cross platform, and the model is embedded in the library itself. Here's the library running on the browser via WASM

The library has a single entry point function that takes a string and outputs a string.

I cann't contribute a PR since I'm not familiar with C++, but I can help to integrate libtashkeel from the rust side via any means necessary.

Best Musharraf

mush42 commented 7 months ago

I'd like to add that the tashkeel model shipped with piper-phonemize is not good at all (although I helped to implement it). The library I'm developing works better since it has been trained with lots of data from modern Arabic.