An Android keyboard that performs speech-to-text (STT/ASR) with OpenAI Whisper and input the recognized text; Supports English, Chinese, Japanese, etc. and even mixed languages.
Microphone Integration: The keyboard now takes input from microphone.
OpenAI Transcribe API: The keyboard now transcribes recorded audio clips.
Permissions & Settings:
If microphone permissions are not granted upon starting recording, a message is shown, and the user is redirected to the app.
In the app's MainActivity, one can set an API Key, and navigate to the app settings panel with a button (for manual permission configuration).
Upon entering MainActivity, the user will be prompted the option to grant microphone permissions.
Exception Handling: Basic handling via message Toasts. Common exceptions will (very likely) not block or crash the keyboard.
Known Issues & Future Directions
Motormouth Countering: Prevent recording an overly long audio clip.
Are You Done?: Implement automatic sentence break detection.
Be My Spokesman?: A totally silent audio clip seems to produce weird sentences like 多謝您收睇時局新聞,再會! among many.
Not A Province: The whisper-1 model produces both simplified and traditional Chinese (Mandarin) characters.
Whisper To My Ear: Currently, audio clips are recorded and saved as files (with hardcoded names). Whether they can be stored in memory / streams, and whether this is a better option, is unknown.
Configurations: Several settings may have rom for improvement.
Ktor Engine: OkHttp
Output format: MPEG4
Audio Encoder: AMR_NB
Testing This Branch
As previous branches, start an emulator.
Connect microphone to host audio input:
Configure an API Key.
Test the transcription utility.
Notes
It is recommended NOT to read all the references thoroughly. There are a lot. Reading solely paragraphs in interest would suffice.
What This Branch Did
MainActivity
, one can set an API Key, and navigate to the app settings panel with a button (for manual permission configuration).MainActivity
, the user will be prompted the option to grant microphone permissions.Toast
s. Common exceptions will (very likely) not block or crash the keyboard.Known Issues & Future Directions
多謝您收睇時局新聞,再會!
among many.whisper-1
model produces both simplified and traditional Chinese (Mandarin) characters.OkHttp
MPEG4
AMR_NB
Testing This Branch
Notes
It is recommended NOT to read all the references thoroughly. There are a lot. Reading solely paragraphs in interest would suffice.
Closes: #2