microg / GmsCore

Free implementation of Play Services
https://microg.org
Apache License 2.0
8k stars 1.66k forks source link

Vosk Speech-to-text Integration with MicroG #1164

Open nshmyrev opened 3 years ago

nshmyrev commented 3 years ago

Hi! I have a good open source speech recognition library which works fully offline on mobile devices called Vosk. It supports dictation in 11 languages and many other nice things. You can try the APK here:

https://github.com/alphacep/vosk-android-demo/releases/download/2020-05/vosk-android-demo-0.3.7-english.apk

I wonder what is the best way to integrate it into MicroG? Should I create a patch for this repo? New package like play-services-speech? Please advise the overall approach.

ljl-covid commented 3 years ago

Are the language models free software as well? I have found them here but the first one I downloaded (the small Chinese model, just because it was, well, small) had no licensing information in it. It just said

This model was compiled from various available sources. Thanks to Xingyu Na for multi-cn recipe in Kaldi.

Note that Kaldi can already be used from Android (though not offline on-device) as a drop-in replacement for Google's speech recognition thanks to Kõnele (F-Droid): the default server is for Estonian, but it's possible to use an English server, or anything you have a model for... which brings us back to the point that availability and acceptable licensing of the models is likely going to be important.

nshmyrev commented 3 years ago

Yes, models are free software licensed under Apache 2 license. I will add license files to the models next time.

Thanks for pointing to Konele, I see I can simply write an app like that or integrate with them! No issue then.

ljl-covid commented 3 years ago

@nshmyrev thanks for your work, I wasn't trying to belittle it in any way, with open source models for several languages in particular it could be very useful.

I do believe it's true that you can have a "drop-in" app like Kõnele (at least, Kõnele needs no system privileges to act as a speech input provider, although it does not do that when I tap the speaker icon in the AOSP keyboard, which is an important shortcoming but I don't know what it's due to), but nonetheless it's something that the Google Mobile Services provide and as such I believe it is in microG's scope. So you may want to leave the issue open, at least until someone actually authoritative about microG can respond, because I'm just a nobody who follows the repository.

nshmyrev commented 3 years ago

Hm, indeed, thank you for your valuable insights!

nshmyrev commented 3 years ago

To update here is an app which enables integration in all the apps using standard google API:

https://github.com/ccoreilly/LocalSTT

ewheelerinc commented 2 years ago

As noted here:

Would love to have voice typing with microG, so +1 for this feature!

ewheelerinc commented 2 years ago

Here is a release and .APK with English (US) language models:

Please test and continue to support this project toward native voice recognition in microG with testing and pull requests !

paolo-caroni commented 2 years ago

Here is a release and .APK with English (US) language models:

Please test and continue to support this project toward native voice recognition in microG with testing and pull requests !

Include this in microg would be perfect ♥, also a stand alone app like this would be good.

LuccoJ commented 2 years ago

@paolo-caroni I don't think this is suitable for inclusion in microG, at least not in the current state, but possibly not at all.

What I would like to see is LocalSTT in F-Droid, or ideally a more generalized version of it that can work with any Vosk model and include a download function for them, or even more ideally, inclusive of a simpler version of Kõnele for use with it without installing both. Really, if LocalSTT itself supported more than one language and were in the main F-Droid repository, I suppose it could start being considered for inclusion in LineageOS for microG (since while not part of microG, it is something that the Google services do provide). For right now, just seeing it in F-Droid would excite me!

KJ7LNW commented 2 years ago

@LuccoJ,

@paolo-caroni I don't think this is suitable for inclusion in microG, at least not in the current state, but possibly not at all.

It doesn' work as an Android speech recognition engine with standard intends and everything, it requires Kõnele for that and then works as an engine for it; I really don't think Kõnele can be integrated into microG unless it goes through a diet.

Right, these examples are PoC at the moment, but I think the bits and pieces are available to make a dedicated language-model-configurable service for microG users without google's dictation service. If someone knows the internals and APK dev well enough to tie everything together with a simple UI then an initial F-Droid app for dictation would work. Vosk Android Service is working on that, but its not ready yet.

microG tends to include what has to be part of microG for system reasons, but what exactly is gained by having LocalSTT+Kõnele as part of microG, as opposed to installing them separately?

Of course you wouldn't load these two APK's into microG as they are, and I don't think that is being suggested. I think @paolo-caroni and others commenting on this issue are saying that the relevant bits in the stand-alone FOSS APKs like LocalSTT and Kõnele could be ripped out into microG to provide the dictation service that Google Services provide but that microG (currently) does not.

[...] I suppose it could start being considered for inclusion in LineageOS for microG (since while not part of microG, it is something that the Google services do provide). For right now, just seeing it in F-Droid would excite me!

Interesting, I thought the goal of microG was to completely replace Google Services (which, it seems, would include dictation).

Are there currently other google services that LineageOS+microG provide but for which microG itself does not?

paolo-caroni commented 2 years ago

I think @paolo-caroni and others commenting on this issue are saying that the relevant bits in the stand-alone FOSS APKs like LocalSTT and Kõnele could be ripped out into microG to provide the dictation service

Yes, you have understood what I think.

Interesting, I thought the goal of microG was to completely replace Google Services (which, it seems, would include dictation).

As far as I know STT and TTS are not part of google play service, they are part of Speech Services that is a proprietary google app, like gmail, streetview, chrome and many other. Imho the inclusion in microg would be interesting because TTS and STT provide an API, important for the android system (microg would provide all API in one app). But is correct, on google side play service doesn't provide this API. And is possible to implement this engines separately.