naeruru / mimiuchi

a free, customizable, osc capable speech-to-text interface for relaying text to different types of applications
https://mimiuchi.com
GNU General Public License v3.0
40 stars 4 forks source link

Speech to text stops working after an hour or so requiring restarting the browser #43

Open HedgeWizardly opened 3 weeks ago

HedgeWizardly commented 3 weeks ago

Submitting this issue on behalf of my partner who streams on twitch and uses this integration all the time, but has found that for quite a long time now, the captions will randomly stop updating. The issue usually occurs after an hour, but can sometimes go multiple hours without an issue, before it happens.

When the issue occurs, the captions stop updating, and an intermittent loading bar appears at the bottom, and after a few moments or so, the microphone icon turns red.

Occasionally, refreshing the page might help, but usually it requires closing and re-opening the browser entirely in order to get it working again, but the more frequently she restarts the browser to fix it in a day, the problem appears more frequently.

My partner uses Google Chrome for her browser. We've tried to see if any console logs are printed in the dev tools, but have seen nothing appear to indicate what might be wrong. Please let me know if there's any additional checking I can do to help diagnose the issue!

naeruru commented 3 weeks ago

interesting, I'll try to debug this more this week! seems a bit tricky to debug, but I can at least put some sort of fail check in if I cannot find the source.

mdrejhon commented 1 week ago

Possibility 1

TIP: WebSpeech quirk: Also, WebSpeech sometimes starts ignoring voice if you speak before you press the microphone button, because it automatically calibrates the background noise based on the first few hundred milliseconds of turning on WebSpeech.

So if you were already talking loudly when you press microphone button, you now have to talk even louder to get captions to come back. Tell the other person to be quiet for the first 1 second of turning on mimiuchi, so it doesn't think their talking is just "background noise".

This is more important for old versions of Chrome than current version of Safari. It's annoying how inconsistent WebSpeech noise-cancellation sometimes is.

Possibility 2

I think it's Chrome + WebSpeech + WiFi fault.

But I simply fix by (A) refresh the page and (B) press the microphone button again. Usually just (B) press microphone again works. I never had to restart the browser, I suspect it's a system-specific bug (e.g. Chrome version, WiFi reliability, operating system, etc).

I have the same problem but I simply refresh the page and it works. I think it's because some WebSpeech implementations transmits to server rather than local decoding.

I think at the beginning, early Google transmitted WebSpeech to their server for decoding, but now local WebSpeech is more common. Safari browser now finally supports WebSpeech (local).

WebSpeech seems to sometimes stops working when there's no captions for a while (e.g. 15 minutes or X minutes) and I have to refresh, it is mighty annoying in the unattended party-captioning situations where I put a laptop down to caption the room (I'm deaf) and somehow it mysteriously stops.

I found Safari + WebSpeech to keep captioning longer than Chrome + WebSpeech so reliability varies from browser to browser, and version of browser. I can disconnect my WiFi on Safari and it keeps working! Mimiuchi runs offline in Safari nicely on my MacBook, while it didn't in an old PC copy of Chrome (I have to check if Chrome now runs WebSpeech offline now). Regardless of how WebSpeech is implemented, all of them seem to auto-stop eventually (annoying).

New Idea

A compromise is a bigger, brighter microphone button (maybe center of screen like a YouTube play button), to make resuming easier? Some deaf, deafned and hard-of-hearing people also have vision impairments, making tiny microphone button difficult. Maybe even as an optional checkbox setting ("[ ] Big Microphone Button When Not Transcribing").

Something roughly like this with more visual cues (caption-stopped message + a big button to mash for vision/mobile impaired people). The big button can be an option in the Settings, disabled by default. Clicking the button (or anywhere on the screen, really) will reactivate WebSpeech.

While I am deaf since birth, elderly people who are now hard-of-hearing, will often have mobility issues (shaky hand, arthritis) and vision issues (can't see tiny buttons). This makes resuming easier:

image

People are familiar with YouTube-play buttons, so it makes resuming captions super-easy.

(@naeruru If you decide to do this in a future "version 2.0", you can make the button optional, in settings. You can theme it your preferred way as long as it is BIG, e.g. purple button or whatever is more consistent; this is just an example. Another idea is you can automatically hide the button if someone starts typing instead, so people can choose to type instead of speak)