WIP: V1.4.x - Githubissues

This branch adds the new AI voice Steinn to Símarómur

This VITS voice Steinn is trained on 16kHz sample rate converted Talrómur voice H (i.e. Steinn) via Piper TTS and uses IPA phonemization. The model inputs are adapted with the appropriate phoneme conversions, like padding every symbol with 0, adding BOS, EOS, etc.

In contrast to the earlier voices, this model is trained on commas, question marks, exclamation marks, dots at the end of sentences and also on tagged silences of the training set. The latter can be controlled at inference time by adding # at the appropriate position of an utterance.

The resulting voice performance is quite good and shows very good runtime performance.

Add OnnxRuntime for the voice model inferencing and add a new TTSEngineOnnx class, which does all ONNX model loading and inference handling
Add Pronunciation for VITS via the class PronunciationVits
Add pronunciation dictionary with 250K entries that map a word -> IPA symbols.
Add new pojo class VitsConfig that provides interpretation of the VITS model configuration file to be able to read the phonetic alphabet -> phoneme id mapping
Remove support for FLite, TorchScript and get rid of support for other models
Remove support for network voices as well
Update Android minSdkVersion, dependencies and Gradle to 8.2.2
Optimize runtime in normalization after benchmarking

The corresponding voice repository will be updated accordingly.

grammatek / simaromur

WIP: V1.4.x #151