add onnxruntime for the voice model and add a new TTSEngineOnnx class
add Pronunciation for Vits via the class PronunciationVits and also add appropriate classes for the other used pronunciation formats via classes PronuncationFP2, PronunciationFlite
add pronunciation dictionary with Word -> IPA symbols. These symbols use a compressed format without any padding or spaces in between. Therefore, we need to retokenize each IPA pronuncation again to spli the dictionary entry into single symbols to be able to convert these to the input ids for the Vits model
add new pojo class VitsConfig that provides interpretation of the Vits model configuration file to be able to read the phonetic alphabet -> phoneme id mapping
added new phoneme dictionary with space separation between phonemes. Adapt the handling of those phonemes
add translation of syllable stresses into sampa_ipa_single_flite.tsv
move any normalization into the appropriate normalization classes away from places like e.g. the class AppRepository
add correct handling of VITS voice for punctuation/non-word characters:
only phonemize space between words, all punctuation/non-word characters need to be placed directy after/before the next character
normalization:
swap ." => ". and ," => ",
fix some unit tests via increasing Java heap space and reducing number of concurrent test runners
fix some digit normalization tests, added some new ones, some of them are failing
fix large digits bug
fix digit norm
add the voice assets for Steinn xs
TTSService: remove cache item only for RTF > 50 We observe that it makes still sense to cache audio even for RTF == 25.
remove support for Flite & Torch based voices
remove GPLv3 License, everything is licensed under Apache 2.0 again.
upgrade Android Build to Gradle 8.2.2 and minimum Android targetSdkVersion
don't do any connectivity check and only accept ONNX voices in assets Initiate searching for voices in assets immediately at the start of the App, but don't do any connection check to Grammatek TTS API and also don't get the release info from Github.
fix leaking file descriptor: close the input/output-stream right after usage
enable StrictMode VmPolicy detectLeakedClosableObjects() by default
set new default strings for on-device voices Info screen
TTSEngineOnnx: add sentence-wise reading of Strings In case the given text is made up of multiple sentences, split it according to the characters .!?;. Otherwise, the current voice reads simply over these symbols without pause, when these are not found at the end of the text.
optimization: use compiled regular expressions in normalization Do most regex matches via compiled regular expressions to stay reliably under 100ms for the normalization phase (on a Pixel 6 phone)
remove orignal tokenizer
adapt speechrate to 0.5x - 3x the possible values retrievable from the TTS settings are 10-600 (i.e. 0.1x - 6x). This is too wide a range for most voices. Distribute the received speechrate proportianlly to a range of 50-300 (i.e. 0.5x - 3x)
.!?;
. Otherwise, the current voice reads simply over these symbols without pause, when these are not found at the end of the text.