stts-se / chromedictator

Demo app for testing Google Chrome's ASR API
MIT License
0 stars 0 forks source link

Content triggered SEND #42

Closed jensedlund closed 5 years ago

jensedlund commented 5 years ago

The tests Wednesday revealed that interpretors can do better than expected at the respeaking task, but that the modality switch to chunk+send is a real problem (may perhpas be possible to overcome by training, but the training need is already high). We'd need at the very least to have SEND+RECORD in one gesture rather than two (that is another feature request, though).

We noticed that there was an automatic SEND-like action when there was sufficient silence. I figure this is to do with the ASR's behaviour (please confirm/comment)? In any case, that worked very well for the respeaker, so at slower bits in the speech, just making clear pauses would work fine. At higher tempo, even this is difficult it turns out, since there is no obvious way of knowing if a pause registered except looking at the interface, which again is a modality change that prooves a little too much. At least without specific training.

We also noticed that the ASR LM clearly accepts "punkt" "frågetecken" and "komma" at high probabilities pretty much anywhere. And saying "punkt" at "punkt" comes more naturally than producing a "long enough" pause. So here's the feature request: Scan for the words "punkt" and "frågetecken" and SEND+RESTART when they occur. ("komma" will perhaps be messy due to the homograph, in any case we'll leace it for now). I don't think there's any need to do a rewrite of the words to "." and "?", except maybe as a feedback - it'll show what caused the SEND.

I know that this is exactly one of those features that sound (and are) simple in theory, but that may cause any number of weird behaviours. I also know you mightn't have any time. But at 9 o'clock next Thusday, we have 6 interpeter students respeaking a few lectures into this thing, and in a perfect world, they should be able to SEND by saying "punkt". If not, it'll work with pauses. Manual send is most likely out of the question, it really seems to throw them off track (but we'll try a bit more with that as well).

HannaLindgren commented 5 years ago

Implemented like this in today's release (0.1-alfa4):

  1. Stop recording/recogniser on keyword (or silence but with some timeout lag)
  2. Restart recording manually with ctrl-enter (or button click)
jensedlund commented 5 years ago

So this now (from 0.1-alfa6 I believe?) has an option to auto-restart after both keywords and silence, and this is what we want. (Although there is a bug, noted elsewhere).

I'm leaving for you @HannaLindgren and @NikolajLindberg to place this in a feature request, since it needs broadening: The list of keywords that trigger "Send" is typically something that one would want to be able to set in a config. For a live, real application, I'd like to be able to upload a config at start and also reconfig in runtime (well a silent restart is fine). But take this as a discussion point for later rather than a feature request.

HannaLindgren commented 5 years ago

The list of keywords that trigger "Send" is typically something that one would want to be able to set in a config. For a live, real application, I'd like to be able to upload a config at start and also reconfig in runtime (well a silent restart is fine). But take this as a discussion point for later rather than a feature request.

Agreed. New ticket #55.