This branch adds the new AI voice Steinn to Símarómur
This VITS voice Steinn is trained on 16kHz sample rate converted Talrómur voice H (i.e. Steinn) via Piper TTS and uses IPA phonemization. The model inputs are adapted with the appropriate phoneme conversions, like padding every symbol with 0, adding BOS, EOS, etc.
In contrast to the earlier voices, this model is trained on commas, question marks, exclamation marks, dots at the end of sentences and also on tagged silences of the training set. The latter can be controlled at inference time by adding # at the appropriate position of an utterance.
The resulting voice performance is quite good and shows very good runtime performance.
Add OnnxRuntime for the voice model inferencing and add a new TTSEngineOnnx class, which does all ONNX model loading and inference handling
Add Pronunciation for VITS via the class PronunciationVits
Add pronunciation dictionary with 250K entries that map a word -> IPA symbols.
Add new pojo class VitsConfig that provides interpretation of the VITS model configuration file to be able to read the phonetic alphabet -> phoneme id mapping
Remove support for FLite, TorchScript and get rid of support for other models
Remove support for network voices as well
Update Android minSdkVersion, dependencies and Gradle to 8.2.2
Optimize runtime in normalization after benchmarking
The corresponding voice repository will be updated accordingly.
This branch adds the new AI voice
Steinn
to SímarómurThis
VITS
voiceSteinn
is trained on 16kHz sample rate converted Talrómur voiceH
(i.e. Steinn) via Piper TTS and usesIPA
phonemization. The model inputs are adapted with the appropriate phoneme conversions, like padding every symbol with 0, addingBOS
,EOS
, etc.In contrast to the earlier voices, this model is trained on commas, question marks, exclamation marks, dots at the end of sentences and also on tagged silences of the training set. The latter can be controlled at inference time by adding
#
at the appropriate position of an utterance.The resulting voice performance is quite good and shows very good runtime performance.
TTSEngineOnnx
class, which does allONNX
model loading and inference handlingVITS
via the classPronunciationVits
IPA
symbols.VitsConfig
that provides interpretation of theVITS
model configuration file to be able to read the phonetic alphabet -> phoneme id mappingminSdkVersion
, dependencies and Gradle to8.2.2
The corresponding voice repository will be updated accordingly.