Add device voices - Githubissues

Add on-device voices from https://github.com/grammatek/simaromur_voices.

Add Fastspeech2 acoustic model + Melgan vocoder for Álfur + Díljá voices in Torchscript for mobile format via git submodule
The audio conversion of on-device voices from float to 16-bit PCM is louder and more clear than the audio received from the network voices, as we use the full 16 bit spectrum and apply dithering. Clipping detection was added, but no clipping occurs with our approach.
activate on-device G2P and add appropriate wrappers for the speficifc input symbols of on-device voices
rename network voices by adding " net" to their names
adapt VoiceInfo dialog, add visual feedback (circular spinner) when playing back voices and make voice playing stoppable
Rearrange position of some files and group them more logically into packages
Fix voice list for TTS service in case it hasn't been loaded via the app before. Also fix the case when Icelandic is not the default locale
Fix voice list handling: Android TTS doesn't officially support multiple voices for the same locale, nor voices that return a different locale than Locale.getAvailableLocales(). Nevertheless we return multiple voices for is_IS, by using the voice name as the variant of the assigned locale. To stick to a voice given us in onLoadVoice(), we need to persist this as our default voice for is_IS. The method onLoadVoice() seems to only be called when the user chooses a voice via the TTS settings, not if the service starts. By persisting the loaded voice name, we can use it immediately without onLoadVoice() being called.
add heuristics for onLoadVoice() in case user has not selected a specific voice and come up with a default voice depending on expeceted speed. Network voices get preference over on-device voice, because currently, network voices are faster than on-device voices. For on-device voices, their individual RTF (realtime factor) is taken into account. The higher, the better
add DeviceVoice attribute RTF (realtime factor) to better describe, how fast an on-device-voice really is. This factor is measured as inference speed in comparison to realtime playing the generated wav audio file on a Google Pixel 6 phone

grammatek / simaromur

Add device voices #76