naeruru / mimiuchi

a free, customizable, osc capable speech-to-text interface for relaying text to different types of applications
https://mimiuchi.com
GNU General Public License v3.0
41 stars 4 forks source link
osc speech-to-text stt translations tts vrchat vue vuetify

mimiuchi: speech-to-text

mimiuchi is a free, customizable, OSC capable, speech-to-text application for displaying text or relaying it to other applications like VRChat. Its customizable text window is also designed to be paired with applications like OBS. It runs on the web, with little setup required beyond customization. You can try it out right now at mimiuchi.com with Chrome, Safari, or Edge. UI currently supports English and Japanese日本語!

Features

How to use

Speech-to-Text

Simply go to mimiuchi.com and press the mic button! You will need to grant access the first time you do it. Currently, mimiuchi uses Web Speech API to perform speech-to-text, which is only supported on the web version. You can read more about it below. In the future I will support more options.

Using OSC

Click the broadcast button to toggle OSC. Due to how VRChat OSC works, this will require the desktop app version which you can download here. If you're using speech-to-text, the web version can relay all speech-to-text to the deskop app when broadcasting is on.

Everything together

Running both applications at once, you simply toggle on the MIC and BROADCAST button on the web app. it will then toggle the desktop on with it.

website -> desktop

mimiuchi-ws_example

website -> desktop -> VRChat

mimiuchi-vrchat_example

Additional info

Why?

I support the idea of people having many ways to communicate and do things. It is important to give people those tools and make them easily accessible. This app will give another way for people to display text in different applications like OBS or VRC. It is free and focused on privacy as an end goal. An example of a very similar application is web captioner. However, I want to expand upon it and make this version unique!

Web Speech API

mimiuchi uses Web Speech API to perform speech-to-text, which is a browser dependent API. Most browsers, like Chrome or Edge, will upload your audio to GCP or Azure respectively to have it processed, while the webpage never gets direct access to it. For example, you can read about Chrome's privacy pertaining to it here. I chose Web Speech API because it is completely free and requires no accounts to access. Unfortunately, its free use is disabled in electron's chromium, so this means speech-to-text in this form can only run in the browser. This adds slight complexity when you want to interface with local applications like VRChat by requiring a "middle application" to relay the text back and forth. Still, I think that this approach is worth it as it provides a free way to use powerful speech-to-text models for people who dont have the means to pay.

In the future, I would like to support a standalone desktop experience, but this is currently on hold till I figure out how popular this might be.

Todo

in no particular order...

Download

See the release page to install the latest version of the desktop app. The desktop version lets you use additional features like OSC.

Building it yourself

Requirements

Setup

Use npm install to install dependencies.

Use npm run dev to run the application. It will run an electron version and web version.

Or you can use npm run build to build the application. It will create an exe file in release/.

Special Thanks

License

This project is licensed under GNU General Public License v3.0 - see the LICENSE.txt file for details.