k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, and speaker recognition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter
https://k2-fsa.github.io/sherpa/onnx/index.html
Apache License 2.0
2.58k stars 292 forks source link

Installing multiple voices #569

Open mablue opened 5 months ago

mablue commented 5 months ago

Hi. I want to mix sherpa-onnx different voices for example ( haniyeh, gyro, amir) with @jing332 tts-server app as Explanations(haniyeh) / dialogue(amir). and also read other languages with other sherpa-onnx tts files for example English words with english sherpa-onnx tts files. but I cant I will first remove last installed next-gen kaldi tts than install new one. please make it someway to let user use multiple voices in his/hir phone. also many blinds are annoying for this problem. them use some gestures to change tts in defferent situations. (espeak is fast them use it for exploring) and sherpa-onnx haniyeh for long texts. and gyro for small social media posts.

csukuangfj commented 5 months ago

That is definitely possible.

The issue is that if we prepackage multiple models in an APK to support multiple languages, the APK would be very large in file size since each model is at least 60 MB.

One solution is to download models for specific languages from within the APK.

The problem is that we don't have much experience in Android development and don't know how to implement that function.

Help from the community is appreciated.

mablue commented 5 months ago

Big apk files are not a big problem. A blind want to do anything to watch the world and read words with good quality defferent voices fast and easy. also in this device still we have vary big latency: SM-A047F/DS

5~10sec in whatsapp contact names also in gyro!

صدا ۰۰۲ (1) (1).zip

please compress the models as much as possable to we test on weak devices

for fixing multiple voices you can just change pakage names for each one or maybe append some thing at the end of each file:

com.k2fsa.sherpa.onnx.tts.engine.fa.haniye com.k2fsa.sherpa.onnx.tts.engine.fa.amir com.k2fsa.sherpa.onnx.tts.engine.fa.gyro com.k2fsa.sherpa.onnx.tts.engine.en.xxxx com.k2fsa.sherpa.onnx.tts.engine.cn.yyyyy

I can easily do it with a moded apkeditor but its not the true way! also apkeditor have a trojan dropper:

https://github.com/PatrickAlex2019/ApkEditor/releases

checked with virustotal.and sent an issue but all issues are removed I dont know how in this repo after my virustotal report!☠️

gyroing commented 5 months ago

A generic Android app with voice folder selection option would be a good solution. Sehrpa onnx Voices Folder => sherpa-voice/lang/lang_country/name/quality/(files)

paolo-caroni commented 5 months ago

A generic Android app with voice folder selection option would be a good solution. Sehrpa onnx Voices Folder => sherpa-voice/lang/lang_country/name/quality/(files)

Sure, but for download the dataset directly by the "generic app" would need internet permissions. This will make dubious the privacy paranoid users, but they can remove the permissions AFTER the download of the dataset/voice. This is exactly what do the google's counterpart or what do Dicio (an open-source assistant that have also STT capability). This will also make more simple the inclusion on F-Droid. Also the voice should have information like language and sex of the speaker, quality and latency on it.

I'm not a good programmer, and I don't know very well Android as I'm only an advanced user on it, but maybe asking to other open-source developer or studyng their code the solution would be found. I'm sure that the vocal assistant Dicio do it, also the TTS RHVoice do it. I hope that this hint would be useful. Sorry for my incompetence.

gyroing commented 5 months ago

A generic Android app with voice folder selection option would be a good solution. Sehrpa onnx Voices Folder => sherpa-voice/lang/lang_country/name/quality/(files)

Sure, but for download the dataset directly by the "generic app" would need internet permissions. This will make dubious the privacy paranoid users, but they can remove the permissions AFTER the download of the dataset/voice. This is exactly what do the google's counterpart or what do Dicio (an open-source assistant that have also STT capability). This will also make more simple the inclusion on F-Droid. Also the voice should have information like language and sex of the speaker, quality and latency on it.

I'm not a good programmer, and I don't know very well Android as I'm only an advanced user on it, but maybe asking to other open-source developer or studyng their code the solution would be found. I'm sure that the vocal assistant Dicio do it, also the TTS RHVoice do it. I hope that this hint would be useful. Sorry for my incompetence.

For preventing of "internet permissions",it is possible to transfer or download voice folder separately from the main app. The "File Access Permissions". in this case will be needed

csukuangfj commented 5 months ago

A generic Android app with voice folder selection option would be a good solution. Sehrpa onnx Voices Folder => sherpa-voice/lang/lang_country/name/quality/(files)

Sure, but for download the dataset directly by the "generic app" would need internet permissions. This will make dubious the privacy paranoid users, but they can remove the permissions AFTER the download of the dataset/voice. This is exactly what do the google's counterpart or what do Dicio (an open-source assistant that have also STT capability). This will also make more simple the inclusion on F-Droid. Also the voice should have information like language and sex of the speaker, quality and latency on it.

I'm not a good programmer, and I don't know very well Android as I'm only an advanced user on it, but maybe asking to other open-source developer or studyng their code the solution would be found. I'm sure that the vocal assistant Dicio do it, also the TTS RHVoice do it. I hope that this hint would be useful. Sorry for my incompetence.

Thank you for the pointers. I will give them a look.

paolo-caroni commented 5 months ago

For preventing of "internet permissions",it is possible to transfer or download voice folder separately from the main app. The "File Access Permissions". in this case will be needed

Sure, but it will be more triky. The user have to download the dataset and save manually on the correct folder, this will lead on ask for support by not very tech user. Also the folder have to be readable and writable by different app (permission, on android any app is like a different user of linux). In my opinion do directly is better, I think only that in the store page or in the app should be indicated that the internet permission is needed only to dowload the voices.

gyroing commented 5 months ago

For preventing of "internet permissions",it is possible to transfer or download voice folder separately from the main app. The "File Access Permissions". in this case will be needed

Sure, but it will be more triky. The user have to download the dataset and save manually on the correct folder, this will lead on ask for support by not very tech user. Also the folder have to be readable and writable by different app (permission, on android any app is like a different user of linux). In my opinion do directly is better, I think only that in the store page or in the app should be indicated that the internet permission is needed only to dowload the voices.

I kindly request you to visit: https://poretsky.github.io/android/smartvoice/

This is "Nuance vocalizer tts engine" easily slecet voice folder and .....

This app also has Language detection to realtime swith between voice model during speech. Samrtvoice contains embedded major voice with low res. Also option to selcet external voice folder.

mablue commented 5 months ago

For preventing of "internet permissions",it is possible to transfer or download voice folder separately from the main app. The "File Access Permissions". in this case will be needed

File access no need in indented package names I think. It will work like a plugin inside main app. Check ( xda MixPlore plugins)

It have six extended Apk files that working together and no file no internet permissions need.

Completely independent tts Apk files will install and each one can usable with screenreader and each one will have a defferent name. For example sherpa.fa.haniyeh Sherpa.en.xxx Sherpa.cn.yyy Biggest problem is that how them will undrestand character range. i think will have a on generic tts mixer that user will set inside it the character range. In regex: For example the default for english tts apk files will be this: /[A-z]/ Or /^[A-Za-z0-9 ]+,[A-Za-z0-9 ]+$/

For persian lng: /^[\u0600-\u06FF\s]+$/

There is some good js file intro in @jing332 tts server repo readme:(for spliting defferent languages) https://github.com/jing332/tts-server-android?tab=readme-ov-file#%E6%9C%97%E8%AF%BB%E8%A7%84%E5%88%99

Its my own config + js rule to use espeak as keyboard characters(sherpa is not good and fast on exploring keyboard characters with screenreader) and next-gen for other words: ttsrv-backup.zip

Other big problem is that blind cant undrestand these things.they want to just set: english = next-gen English . Apk farsi = next-gen fa haniyeh . Apk Other = espeakNG.APK or etc

Also They live with screen readers and tts engines. and They can never leave espeak engine Because it is fast Even if it sounds terrible (We have not persian language in google speech and synthesis) (We use jing tts server for reading persian but its online and in iran net is premently down👎)

paolo-caroni commented 5 months ago

It have six extended Apk files that working together and no file no internet permissions need.

So you have to create a store page for each apk/language that is complex to update continuously (on any store, not only F-droid, but also playstore for example).

Other big problem is that blind cant understand these things.

Please consider that a TTS is important to anyone, not only blind people, TTS is used by the drive navigator, smartphone assistant, home assistant, any app that speech with you. Obviously they need more than other, and their perspective should be considered, but I think that for most people a standardized method would be preferable, most user are "educated" by Google's proprietary TTS, so i suppose that a "clone" would be simpler to use than anything (and also well integrated on the system).

They can never leave espeak engine Because it is fast Even if it sounds terrible

I'm not blind, but since 4 days ago I have installed the actual version of sherpa-onxx and I prefer it, is not much slower than espeak.

mablue commented 5 months ago

Yes Maybe we need a big store for it 😜. It's not true way...I think the better method that blinds love is Vocalizer voices But still I love permissions-less Apks but its not good to repeat GUI in each one. It make improvements hard to think. But if we just have *.pt models db on web and a unique GUI Apk and or maybe exe or IPA file it can be better. Its not good in old devices it have 5~20sec latency sometimes. also blind can't understand whats his device CPU architecture...cpu-z like apps are not accessible for blinds.

Clone?! You mean I will edit google TTS for Persian?!It originally have not support to Persian language. we use Urdu but its completely different in numbers and some other things...

Easier way to do for now is just changing 
package names to have multiple TTS Apks in one 
device to try mixing them with TTS server app.

I think TTS server can be a good GUI for us.
paolo-caroni commented 5 months ago

Yes Maybe we need a big store for it 😜.

As pointed on the F-Droid issue see the multiple language of ASK, is an headake to update all the stores page, in particular on F-Droid (that will be done manually, and it is always old, especially for less used language).

I think the better method that blinds love is Vocalizer voices

From the screenshots I don't see much difference from RHVoice or Google's TTS.

Clone?! You mean I will edit google tts for persian?!

No, I mean that sherpa-onnx should be similar to Google's TTS user experience, that is also similar to vocalizer voices as I can see on screenshots, if the apk is well integrated with TTS API maybe one day sherpa-onnx would no need even a launcher, since all can be done by the system and the apps that use TTS engine (Google's TTS does not have a launcher). Also please consider that with the actual implementation an app cannot change language of the TTS, since any sherpa-onnx engine support only one language, and an app cannot change engine (can use a specific engine if installed, but this would lead to an app specifically designed for sherpa-onnx engine, that is crazy or at least improbable).

mablue commented 5 months ago

an app cannot change engine (can use a specific engine if installed, but this would lead to an app specifically designed for sherpa-onnx engine, that is crazy or at least improbable).

Try this tts mixer gui: https://github.com/jing332/tts-server-android

With these rules: https://github.com/jing332/tts-server-android?tab=readme-ov-file#%E6%9C%97%E8%AF%BB%E8%A7%84%E5%88%99

We can mix multiple tts engines there(online and offline and etc...) in tts server as you see. It's possible with js rules. It changing automaticly by any rule or with this method.

paolo-caroni commented 5 months ago

Thank you for the pointers. I will give them a look.

@csukuangfj About RHVoice seems that they have choose to storage the Voices as a media. See line 6 of their manifest. About the download i suppose that is interesting to see Repository and DataSynkWorker. Abviously you can't copy and paste the code (LGPLv2.1 vs apache and java vs kotlin) but you can understand how it work and do your own different implementaton. Also, as always, this is one way to implement the download and the storage of voices assets,not necessarily the best possible implementation. However RHVoice has good coverage of Android's internal TTS APIs, so I'm happy to point it to you.

Also please to consider that if you choose to not merge ASR/STT and TTS engines #580 in one unique package and the assets are basically the same *.onnx you cannot treat them as App specific files, since would be better that two different app can acces them (to avoid double files on the storage for each language), if you choose to merge all sherpa onnx funcionality in one engine app you are more free to make different choices about the data storage of the assets.

jing332 commented 4 months ago

Now supports multiple models: https://github.com/jing332/SherpaOnnxTtsEngineAndroid

jing332 commented 4 months ago

/storage/emulated/0/Android/data/com.k2fsa.sherpa.onnx.tts.engine/files/model/

paolo-caroni commented 4 months ago

Now supports multiple models: https://github.com/jing332/SherpaOnnxTtsEngineAndroid

@jing332 Why you have imported the files in a new repo and not simply forked the original repo? With this commit you have changed Kaldi name to Cardi... Also without forking the repo how you can make a PR?

It seems a good start point, but surely need improvements.

jing332 commented 4 months ago

Now supports multiple models: https://github.com/jing332/SherpaOnnxTtsEngineAndroid

@jing332 Why you have imported the files in a new repo and not simply forked the original repo? With this commit you have changed Kaldi name to Cardi... Also without forking the repo how you can make a PR?

It seems a good start point, but surely need improvements.

The sherpa-onnx project is too complex, forking it requires modifying various build scripts, and I don't have the energy to spend on it.

When a new release of sherpa-onnx is published, only the SO and JNI bridge files (If the API changes) need to be updated.

csukuangfj commented 4 months ago

Never mind. Once it is finished, I can help merge it into sherpa-onnx.

mablue commented 4 months ago

@paolo-caroni ttsherper

paolo-caroni commented 4 months ago

@mablue post a proposed logo here is little off-topic, also icefall is the program used to train the models, sherpa is the program that use it, this repo is about sherpa (specifically sherpa onnx), not icefall.

mablue commented 4 months ago

Sorry. This is just a reaction for an uncontrollable happiness. 🎂🎉🎂🎊 We have been waiting for such an update for a long time

paolo-caroni commented 4 months ago

@csukuangfj do you think that can be externalized the folder "espeak-ng-data" from tts models? It's a nonsense to have this folder repeated for each model. If you think isn't possible to externalyze this folder, do you think that can be reduced? Actually on it there is all languages dictation, not only the language of the model itself.

csukuangfj commented 4 months ago

It's a nonsense to have this folder repeated for each model.

I agree. If there are multiple models using this folder, then this folder can be shared.

csukuangfj commented 4 months ago

It is repeated in each model so that each model is self-contained.

mablue commented 4 months ago

1) Keeping the tts engine completely offline is very important in tts engines. But we need to install 2 or more apks in one device that it not available. 2) But the better choice for a blind is having a tts engine that have a internal model file downloader with internet access. 3) solution: make next-gen kaldies package names different with each other. to installing more than one apk be available to user

It can provide confidence and functionality. Also if somebody just needs one or two tts models separately it can install just multiple next-gen kaldies in his/her phone without having server app.

We need to edit next gen GitHub worker script that compiles next gens automatically in different package names and identifications for android devices For example like that:

Sherpa.onnx.tts.engine.<lang>.<voice>

Than we can install two or more than one Sherpa tts engines in one device. Than we can mix them with tts-server

paolo-caroni commented 4 months ago
3. solution: make next-gen kaldies package names different with each other. to installing  more than one apk be available to user

It can provide confidence and functionality. Also if somebody just needs one or two tts models separately it can install just multiple next-gen kaldies in his/her phone without having server app.

We need to edit next gen GitHub worker script that compiles next gens automatically in different package names and identifications for android devices For example like that:

Sherpa.onnx.tts.engine.<lang>.<voice>

@mablue Why? This will be like come back as before. There is a PoC with bugs, but improvable by jing332. Separated apk (multiple sherpa onnx TTS engines) will occupy more memory on smartphone and also more internet connection (that you have write is not very reliable in your country). Also put in various android markets all languages would be a huge work.

I don't understend your change of mind, previously you seem pretty exited about new all language TTS apk.

paolo-caroni commented 4 months ago

Can you find me an intro?

@mablue next gen kaldi is very well documented, read carefully android istruction. If you found problem create a new separeted issue about the compiling problem.

mablue commented 4 months ago

Can you find me an intro?

@mablue next gen kaldi is very well documented, read carefully android istruction. If you found problem create a new separeted issue about the compiling problem.

Yesterday, I finished compiling the C++ section. However, there are still bugs preventing the APK compilation. I believe referring to the introduction documents (introdocs) might be helpful. I'll give it a shot. Thanks!

mablue commented 3 months ago

At last i fixed some errors and release apk is here for download and test Thanks @jing332 for his/her beautiful work https://github.com/mablue/SherpaOnnxTtsEngineAndroid/actions

kulmegil commented 1 month ago

@mablue Are there plans to continue development of the multi tts engine? It looks promising.

The biggest issue: it's only working with apps that specifically allows to set specific voice name. If voice name is not provided, only language, it will fail ("TTS Config not set").

mablue commented 3 weeks ago

@kulmegil maybe we continue. Append it as a seperated issue to multi tts repo.