googlesamples / mlkit

A collection of sample apps to demonstrate how to use Google's ML Kit APIs on Android and iOS
Apache License 2.0
3.48k stars 2.91k forks source link

I hope you can provide a machine learning library for speech recognition to make it easier for developers to do these functions. #523

Open BraveMomo opened 2 years ago

BraveMomo commented 2 years ago

What's your feature request? Please describe. A clear and concise description of what the request is. Ex. I would like to have X language support in text recognition[...]

It is very difficult for developers to monitor users' voices, convert them into text, and achieve this. There are many companies in the market that use this as a selling point to sell at high prices, and often their recognition results are not good, so, I hope you can provide a machine learning library for speech recognition, which makes it easier for developers to do these functions. It can also enable more people on the earth to enjoy the convenience and happiness of technology.

Mobile environment Android, IOS or both

Android

Additional context Add any other context or screenshots about the feature request here.

bcdj commented 2 years ago

Thanks for the feature request!

Android provides the SpeechRecognnizer API. iOS also provides Speech library (tutorial) for performing speech recognition. Can these libraries satisfy your needs? If not, could you elaborate what painpoints are affecting your use cases?

Thank you!

BraveMomo commented 2 years ago

Thanks for the feature request!

Android provides the SpeechRecognnizer API. iOS also provides Speech library (tutorial) for performing speech recognition. Can these libraries satisfy your needs? If not, could you elaborate what painpoints are affecting your use cases?

Thank you!

Due to the serious problem of fragmentation of the Android platform, each manufacturer has its own implementation, which is possible in the native Google system, but it needs to be modified in the Chinese environment, and the modified quality is uneven. In addition, the premise of this method is that SpeechRecognizer.isRecognitionAvailable (final Context context) this method will return true, more than one billion users in China, and many mobile phone systems do not have this service available, which also makes this thing unavailable. So as a developer of software for 1.4 billion users in China, I hope you can provide a system-independent, powerful machine learning speech recognition library to reduce

bcdj commented 2 years ago

Thank you so much for the insightful comments! Since the fragmentation issue you mentioned is specific to Android, not with iOS, even in China, so what you are looking for is a system-independent Android-only library, given iOS already provides the Speech library?

BraveMomo commented 2 years ago

Thank you so much for the insightful comments! Since the fragmentation issue you mentioned is specific to Android, not with iOS, even in China, so what you are looking for is a system-independent Android-only library, given iOS already provides the Speech library?

What I need is a library that can be easily called and used for speech recognition when developing relevant functions in Android. As to whether it has anything to do with Android, I can accept it. In addition, all countries have so-called dialects, but not so standard pronunciation, so I think that providing this function in the machine learning library will be more accurate and efficient, and will be more powerful with the upgrade of the library version.

gulabsagevadiya commented 2 years ago

I am not a very experienced developer in this field but there is no lib available for Audio to text Convertors. Yes, a Speech recognizer is available but Audio to text translation is not available for free.

BraveMomo commented 2 years ago

I am not a very experienced developer in this field but there is no lib available for Audio to text Convertors. Yes, a Speech recognizer is available but Audio to text translation is not available for free.

Let me give you an example. I want to do a video playback app. I want to automatically generate subtitles by monitoring the sound of video resources. At this time, SpeechRecognnizer API. Can it be satisfied? it seems that it will monitor the ambient sound, which will lead to confusion in recognition. If it is satisfied, can you tell me how to set it up? if not, I hope you can add such an enhanced library of speech recognition to text in the ml library to meet similar needs.

mlbi commented 2 years ago

Hi, I have a question about your example - can you explain further what is the issue you have with ambient sound vs speech sound, in your use case? When you are using SpeechRecognizer on the sound of a video, what are the results that you get that you would not like to get? a) is it that ambient non-speech sound gets "wrongly" translated into text? b) is it that some speech is not recognized because there is ambient sound in the background?