k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, and speaker recognition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter
https://k2-fsa.github.io/sherpa/onnx/index.html
Apache License 2.0
2.44k stars 280 forks source link

ASR TTS Merge into one apk #580

Open Pantyhose-X opened 4 months ago

Pantyhose-X commented 4 months ago

I can't even download TTS. https://github.com/k2-fsa/sherpa-onnx/releases --Only ASR ! Where's TTS?

csukuangfj commented 4 months ago

I can't even download TTS. https://github.com/k2-fsa/sherpa-onnx/releases --Only ASR ! Where's TTS?

Please read our README.

csukuangfj commented 4 months ago

Please see https://k2-fsa.github.io/sherpa/onnx/tts/apk-engine.html

paolo-caroni commented 4 months ago

The title is different from the text on this issue, as pointed by others TTS is in a different apk, also different for each language and each voice at the moment. Google have choose to bring TTS and STT/ASR in one apk, since (probably) use similar or same voice database assets. This maybe will be interesting in the future, but depend on the developers view, if merge TTS and STT/ASR will lead to a more complex code for gain about only 70/100MiB of free space... probably the developers should choose the best choice for their perspective. Honestly I will love to have all in one app, but compared to other issue I don't think that is a priority. Consider that STT/ASR have not implemented Recognition Service as I cannot see it on voice input option, and don't support ACTION_RECOGNIZE_SPEECH since I cannot use it from my keyboard.

paolo-caroni commented 4 months ago

@csukuangfj what do you think about this topic? Unify ASR/STT and TTS in one apk would be useful? It seems to me that the C++ code is the same for ASR/STT, voice identification and TTS (28MiB for all CPU architectures), but the onnx model seems different. If you confirm that the models are different and incompatible, this issue should be closed, I think.

csukuangfj commented 4 months ago

The code is shared but the models are different.

Also, you can install asr apk, tts apk, and speaker identification apk simultaneously on your phone.

paolo-caroni commented 4 months ago

Also, you can install asr apk, tts apk, and speaker identification apk simultaneously on your phone.

Sure, but in that case there is a minimum of 3 differebt app that will be update to stores #520 (fdroid, google play, xiaomi, huawei, samsung, amazon, ecc.). Only one app is simpler only for that, but will complex code maintain and development. Also, as proposed by @mablue on #569 maybe the icons will be different TTS, STT and identification with similar sherpa logo but still different. I can do that, but I need some confirmations: Licence of the original logo(apache? Creative commons?); Would be you (and other developers) like that idea?

Also if there is 3 different apps, maybe would be a good idea to separate the repository (but still on the official k2-fsa), since in the future will be more issues about an app specific bug and not all sherpa onnx code.

@csukuangfj What do you (and others) think about that?

mablue commented 4 months ago

Sherpa tts download page is not accesable for blinds. Also them need a sherpa-onnx telegram group to connect directly with developers. Please make it @csukuangfj them requested me to say it to you I think first problem fixed by @jing332 client for sherpa but I cant use it. Its still not work. I dont know why but it have not voice in all phones!! But its very good in ui and managing voices and langs! Its advance but still not like tts server https://github.com/jing332/SherpaOnnxTtsEngineAndroid

Designing Icons All are important for people. And we will have one client. Just one client with multiple voices as tts. And ASR functionality for persian still not available

paolo-caroni commented 3 months ago

I think first problem fixed by @jing332 client for sherpa but I cant use it. Its still not work. I dont know why but it have not voice in all phones!! But its very good in ui and managing voices and langs! Its advance but still not like tts server https://github.com/jing332/SherpaOnnxTtsEngineAndroid

This is a free software, writed by community of people around the world (and maybe xiaomi if I'm not wrong), you cannot pretend nothing, especially to have fully functional app in zero days.

And ASR functionality for persian still not available

You have opened #559, have you tained a persian model?

mablue commented 3 months ago

I'm will learn icefall ...I didnt start reading icefall. I haven't started learning icefall yet. But I am interested. Ganjoor site is a good source for Farsi texts and sounds

For example this page https://ganjoor.net/saadi/golestan/gbab1/sh10 Many voices many poems.with time based ui while playing. I'll try with this source to train

paolo-caroni commented 3 months ago

I'm will learn icefall ...I didnt start reading icefall. I haven't started learning icefall yet. But I am interested. Ganjoor site is a good source for Farsi texts and sounds

@mablue this is totally off-topic, but I think is simpler to you use an already used dataset, such as common voice, that have persian language and is supported by icefall

paolo-caroni commented 3 months ago

@csukuangfj we are going offtopic, but you can respond about the logo question? Since merge all in one apk does not have reason (and so this issue can be closed), what do you think about make different logos fot TTS, ASR/STT and speaker identification?

csukuangfj commented 3 months ago

what do you think about make different logos fot TTS, ASR/STT and speaker identification

Yes, that sounds good to me. Would you like to contribute?

paolo-caroni commented 3 months ago

Also, as proposed by @mablue on #569 maybe the icons will be different TTS, STT and identification with similar sherpa logo but still different. I can do that, but I need some confirmations: Licence of the original logo(apache? Creative commons?); Would be you (and other developers) like that idea?

Also if there is 3 different apps, maybe would be a good idea to separate the repository (but still on the official k2-fsa), since in the future will be more issues about an app specific bug and not all sherpa onnx code.

@csukuangfj What do you (and others) think about that?

@csukuangfj probably you have missed the text 4 days ago, yes, I would like to contribute, but please confirm licence of the original logo (since I have to modify it mixing with other image).

csukuangfj commented 3 months ago

please confirm licence of the original logo

The logo is from us and we are publishing all of our work with Apache 2.0 license.

mablue commented 3 months ago

Since merge all in one apk does not have reason

Why? We generating 20 gigabits of repeating binary and etc data in sherpa tts asr and identification apk files. The reson that nobody can do anything to this project is that there is not any all in one small just java-cpp client. And everything is Scattered. Me and many Developers waiting to have just one client and many models to generate what we want with one clinet. If we cant make all in one apk for all tts models it will be a failed project. Cuz no one understands what and where should start the job of updating and etc. I think we should continue to @jing332 project. I love to do someting but internet in iran is a trash 🗑️. Me and many other people cant still get a voice form his/her work (multi lng sherpa ) Im in android 14 and others 12

paolo-caroni commented 3 months ago

@mablue merge ASR/STT, TTS and speaker identification is TOTALLY different to have all the languages in a TTS apk. Please read carefully this issue

mablue commented 3 months ago

@mablue merge ASR/STT, TTS and speaker identification is TOTALLY different to have all the languages in a TTS apk. Please read carefully this issue

If we cant merge all just merge tts. Tts can save lifes. Blinds are using it all around the world. Some people in iran have not google tts in persian language because of senctions. And we need merging cuz of having english and maybe other languages near persian lng

houyafei commented 1 month ago

follow the document,asr and tts merge into onine apk is no difficult thing。 the one app will keep two or more model files In assert directory.

mablue commented 1 month ago

follow the document,asr and tts merge into onine apk is no difficult thing。 the one app will keep two or more model files In assert directory.

It's not a hard job yes. some thing like this: https://github.com/jing332/SherpaOnnxTtsEngineAndroid

It's just merge of all tts in one onnx model loader! But the problem is switching between models while reading a (persian-english) mixed text. I use this thing to fix: https://github.com/jing332/tts-server-android

With this rule: https://t.me/ttsfarsi/379

But its very hard process for a blind... Them asking me a model that have two languages inside it or maybe if someone can do that a colab file that I just set a variable like 'en-fa' to get english farsi(persian) mixed model or tts engine apk. Pashtoo is another other problem too them need a pashtoo tts engine but I cant find any data anywhere to make it. Except microsoft readaloud online service and something like t2s android app to generate 24hours length pashtoo voices from pashtoo texts to generate a pashtoo model. But I will first learn icefall