j3soon / speech-to-windows-input

Perform speech-to-text (STT/ASR) with Azure speech service and simulate keyboard to input the recognized text; Supports English, Chinese, Japanese, and more
MIT License
28 stars 3 forks source link
automatic-speech-recognition azure azure-speech-service chinese-speech-recognition simulate-keyboard speech speech-recognition speech-to-text voice voice-recognition

Speech to Windows Input (STWI)

Perform speech-to-text (STT/ASR) with Azure Speech service and simulate keyboard to input the recognized the text.

Supports English, Chinese, Japanese, and more...

Why this Project?

So I decided to implement a free & open-source speech recognition tool that works on all applications on Windows.

Update: If your machine can be upgraded to Windows 11, the built-in speech recognition tool (Win+H) now supports more languages and could be considered a valid alternative to this project.

Why Use Azure Speech Service as Backend?

This project is not affiliated with Microsoft / Azure in any way.

If there's a better speech recognition service with higher accuracy & reasonable price, please file an issue.

How to Use

Prerequisites:

Installation:

Demo GIFs

With "Languages": ["en-US"]:

With "Languages": ["zh-TW"]:

Note: If your current input language is set to Chinese, you can select characters after performing speech recognition.

Features:

(✔️ means implemented, ❌ means not)

Configuration File

{
  "AzureSubscriptionKey": "<paste-your-subscription-key>",
  "AzureServiceRegion": "<paste-your-region>",
  "Languages": [
    "en-US",
    "zh-TW"
  ],
  "PhraseList": [],
  "PrioritizeLatencyOrAccuracy": "Latency",
  "SoundEffect": false,
  "InputIncrementally": true,
  "OutputForm": "Text",
  "AutoPunctuation": true,
  "DetailedLog": false,
  "ContinuousRecognition": false,
  "TotalTimeoutMS": 60000,
  "UseMenuKey": false,
  "UseFxKey": 0,
  "SendTrailingEnter": false,
  "SendTrailingSpace": false,
  "ChineseChatMode": false,
  "ForceCapitalizeFirstAlphabet": true,
  "ShowListeningOverlay": true,
  "UseSwitchConfigKey": false
}

Make sure to replace <paste-your-subscription-key> and <paste-your-region> to valid Azure credentials, otherwise, an error will be raised.

Program Hint

speech-to-windows-input (STWI) %VERSION_STRING%

Source Code Link (MIT License):

    https://github.com/j3soon/speech-to-windows-input

1. Press Alt+H to convert speech to text input. The recognition stops on (1) microphone silence (2) after 15 seconds (3) Alt+H is pressed again.
2. Press ESC to cancel the on-going speech recognition (no input will be generated).
3. Press Ctrl+C to exit.

Notes:
- The default microphone & internet connection is used for speech recognition.
- If input fails for certain applications, you may need to launch this program with `Run as administrator`.
- The initial recognition delay is for detecting the language used. You can modify the language list to contain only a single language to speed up the process.

Side Notes

Runtime:

Compilation:

References

Minimal Sample: