Arabic TTS text preprocessing

Thanks for your awesome work!

I tried the Arabic TTS voice (Kareem), and I noticed that an important text preprocessing step is missing.

Arabic text is usually unvocalized (aka diacritized). For the purposes of intelligibility the text must be vocalized before phonemization. Usually, a lightweight neural network is used for vocalization. This important preprocessing step is missing from sherpa-onnx.

Piper's Arabic voice has been trained with vocalized text. I say this because I prepared and audited the data used for training that voice.

Fortunately, I'm developing a package for Arabic-text vocalization named Libtashkeel.

It is written in Rust, has a C API, is developed to be cross platform, and the model is embedded in the library itself. Here's the library running on the browser via WASM

The library has a single entry point function that takes a string and outputs a string.

I cann't contribute a PR since I'm not familiar with C++, but I can help to integrate libtashkeel from the rust side via any means necessary.

Best Musharraf

k2-fsa / sherpa-onnx

Arabic TTS text preprocessing #821