Perform speech-to-text (STT/ASR) with Azure Speech service and simulate keyboard to input the recognized the text.
Supports English, Chinese, Japanese, and more...
Win+H
) only supports English.Alt+`
) supports English, Chinese, Japanese and more languages. The recognition accuracy (especially for Chinese) is really good. Unfortunately, it only works in Office 365 applications and requires an Office 365 subscription. (Not free)So I decided to implement a free & open-source speech recognition tool that works on all applications on Windows.
Update: If your machine can be upgraded to Windows 11, the built-in speech recognition tool (
Win+H
) now supports more languages and could be considered a valid alternative to this project.
This project is not affiliated with Microsoft / Azure in any way.
If there's a better speech recognition service with higher accuracy & reasonable price, please file an issue.
Prerequisites:
Installation:
config.json
.Alt+H
to start speech recognition.Languages
option in the config file for better user experience.With "Languages": ["en-US"]
:
With "Languages": ["zh-TW"]
:
Note: If your current input language is set to Chinese, you can select characters after performing speech recognition.
(✔️ means implemented, ❌ means not)
Languages
)PhraseList
)InputIncrementally
)AutoPunctuation
) ContinuousRecognition
& TotalTimeoutMS
)Alt+H
) again. Timeouts in 60 seconds (i.e., 60000 ms) By default.UseMenuKey
, UseFxKey
)≣ Menu
). So we can override it to behave as the speech recognition key. Similarly, you can override F1
key to F24
key.SendTrailingEnter
, SendTrailingSpace
)ChineseChatMode
)ShowListeningOverlay
)UseSwitchConfigKey
)PrioritizeLatencyOrAccuracy
, SoundEffect
, OutputForm
, DetailedLog
, ForceCapitalizeFirstAlphabet
){
"AzureSubscriptionKey": "<paste-your-subscription-key>",
"AzureServiceRegion": "<paste-your-region>",
"Languages": [
"en-US",
"zh-TW"
],
"PhraseList": [],
"PrioritizeLatencyOrAccuracy": "Latency",
"SoundEffect": false,
"InputIncrementally": true,
"OutputForm": "Text",
"AutoPunctuation": true,
"DetailedLog": false,
"ContinuousRecognition": false,
"TotalTimeoutMS": 60000,
"UseMenuKey": false,
"UseFxKey": 0,
"SendTrailingEnter": false,
"SendTrailingSpace": false,
"ChineseChatMode": false,
"ForceCapitalizeFirstAlphabet": true,
"ShowListeningOverlay": true,
"UseSwitchConfigKey": false
}
Make sure to replace <paste-your-subscription-key>
and <paste-your-region>
to valid Azure credentials, otherwise, an error will be raised.
AzureSubscriptionKey
: Your Azure subscription key.AzureServiceRegion
: Your Azure subscription region. (e.g., "westus"
or "eastasia"
)Languages
: A list of languages to perform speech recognition. (No mixed language recognition) See the supported languages for more information. If the list only contains a single language, the recognition performance will be much faster since no auto language detection is performed.PhraseList
: A list of of custom phrases such as names, technical terms, etc. For an example, if you are a gamer that types GG
(i.e., Good Game) a lot, you will want to add GG
in this list. Otherwise, Azure will recognize it as JuJu
, Gigi
, etc.PrioritizeLatencyOrAccuracy
: Select the recognition mode between "Latency"
and "Accuracy"
.SoundEffect
: Determines whether the program should play a hint sound when the speech recognition starts/stops.InputIncrementally
: Determines whether the program should input incrementally (revise with backspace along the way), or simply input once when the recognition result is confirmed.OutputForm
: Select the recognition output, can be "Text"
, "Lexical"
, or "Normalized"
. See the output form details for more information.AutoPunctuation
: Determines whether to automatically insert punctuation.DetailedLog
: Determines whether to log confidence, recognized words, and other recognition details.ContinuousRecognition
: Determines whether to use continuous recognition, which requires user to manually stop the on-going recognition.TotalTimeoutMS
: When continuous recognition is enabled, the recognition will be stopped after the specified amount of milliseconds, to avoid user forgetting to stop the on-going recognition.UseMenuKey
: Determines whether the menu/application key (≣ Menu
) can be used as Alt+H
. (Make the key acts like a dedicated speech recognition key)UseFxKey
: Determines whether the F1
-F24
keys can be used as Alt+H
.
"UseFxKey": 0
disables this feature, "UseFxKey": 1
uses F1
, and so on.SendTrailingEnter
: Determines whether to send a trailing enter
after sending inputs.SendTrailingSpace
: Determines whether to send a trailing space
after sending inputs.ChineseChatMode
: Replaces Chinese comma (,
) into English whitespace (`), and removes Chinese period (
。`).ForceCapitalizeFirstAlphabet
: Force capitalization of the first English alphabet in a sentence. This allows better user experience when InputIncrementally
is enabled.ShowListeningOverlay
: Determines whether to show an indicator microphone overlay window when the program is listening.UseSwitchConfigKey
: Determines whether to enable switching between configs with Alt+0
, Alt+1
, and so on.speech-to-windows-input (STWI) %VERSION_STRING%
Source Code Link (MIT License):
https://github.com/j3soon/speech-to-windows-input
1. Press Alt+H to convert speech to text input. The recognition stops on (1) microphone silence (2) after 15 seconds (3) Alt+H is pressed again.
2. Press ESC to cancel the on-going speech recognition (no input will be generated).
3. Press Ctrl+C to exit.
Notes:
- The default microphone & internet connection is used for speech recognition.
- If input fails for certain applications, you may need to launch this program with `Run as administrator`.
- The initial recognition delay is for detecting the language used. You can modify the language list to contain only a single language to speed up the process.
Runtime:
Free F0
instead of Standard F0
(Pay as You Go). The free tier has 5 hours audio quota each month, and will not charge you when exceeding the quota limit.Compilation:
AnyCPU
, use x86
/x64
instead.Minimal Sample: