An Android keyboard that performs speech-to-text (STT/ASR) with OpenAI Whisper and input the recognized text; Supports English, Chinese, Japanese, etc. and even mixed languages.
35
stars
4
forks
source link
Compatibility with OpenAI API and Whisper Webservice. #13
The app is now compatible with both OpenAI APIs and Whisper Webservices. It can now work with official, non-official, and self-hosted servers as long as they abide by either standard.
The app now offers configuration for endpoint choices etc.
Fixed: New cursor position after text input. Previously, the cursor ends up at several characters ahead of the end of the commited text.
Future Directions
Several possible refactoring, optimization, and improvements.
HTTPS: Due to the need of testing self-hosted servers, clear-text transmission is explicitly allowed. This should not be the case in the near future.
Refactoring: Configuration UI class, DataStore class, etc.
Motormouth, Automatic Sentence Break, Silent Audio Clips, Traditional / Simplified Chinese, In-Memory Recording (first mentioned in #8)
App Localization
Manual Cursor Displacement (first mentioned in #6)
UX: Configuration "set" buttons do not work intuitively (no way to tell whether changes are saved or not etc.).
GPU: docker run -d --gpus all -p 9000:9000 -e ASR_MODEL=base -e ASR_ENGINE=openai_whisper onerahmet/openai-whisper-asr-webservice:latest-gpu
Recommend ASR_MODEL are base, small, or medium for locally hosted servers.
At the time this PR is created, onerahmet/openai-whisper-asr-webservice has version v1.2.0.
After setting up a locally hosted server, it is recommended that you visit localhost:9000 and make one request using the web interface, as the container may not download necessary models before the first request is made and completed.
Open app and make the following configurations:
Endpoint: For locally hosted servers, configure app endpoint as http://<local-ip>:9000/asr. Obtain local IP via methods like ipconfig, which could look like 192.168.xxx.xxx. For hosted servers out there, use their URL endpoints.
What This Branch Accomplished
Future Directions
Several possible refactoring, optimization, and improvements.
Testing This Branch
OpenAI API
Should work as before.
https://api.openai.com/v1/audio/transcriptions
zh
oren
OpenAI API
Self-hosted Whisper Webservice
docker run -d -p 9000:9000 -e ASR_MODEL=base -e ASR_ENGINE=openai_whisper onerahmet/openai-whisper-asr-webservice:latest
docker run -d --gpus all -p 9000:9000 -e ASR_MODEL=base -e ASR_ENGINE=openai_whisper onerahmet/openai-whisper-asr-webservice:latest-gpu
ASR_MODEL
arebase
,small
, ormedium
for locally hosted servers.onerahmet/openai-whisper-asr-webservice
has versionv1.2.0
.localhost:9000
and make one request using the web interface, as the container may not download necessary models before the first request is made and completed.http://<local-ip>:9000/asr
. Obtain local IP via methods likeipconfig
, which could look like192.168.xxx.xxx
. For hosted servers out there, use their URL endpoints.zh
oren
Closes: #3