add dtmf-term-timeout and speech-complete-timeout

Jared-Prime commented 9 years ago

allows for more fine-tuned settings on ASR input

squashed commits follow

Add Recognition-Timeout as an Input attribute

update with suggested edits

specify recognition-timeout in milliseconds

tweak description of recognition-timeout

Jared-Prime commented 9 years ago

@bklang @benlangfeld We have a couple more MRCP settings we would like to use with Rayo/Punchblock. This requested spec change describes both speech-complete-timeout and dtmf-term-timeout

cc @sfgeorge @runningferret

benlangfeld commented 9 years ago

Thanks for this. What I'm wondering is:

Is it worth us doing this piecemeal, or should we specify all MRCP attributes in Rayo in one go?
What is the place of Rayo's media components in relation to MRCP? It's supposed to be an abstraction, but where do we define the cut-off point for importing complexity?

Thoughts @bklang, @jsgoecke, @mpermar?

Jared-Prime commented 9 years ago

Very good questions. 1 seems dependent upon 2, by which I mean, if Rayo should offer media components matching MRCP, then they should be implemented together rather than piecemeal. ( Sidenote: The only reason I'm dropping them in one-at-a-time here is because we're only adding them in the Ifbyphone fork as we have an immediate need. )

sfgeorge commented 9 years ago

I vote for 1., specifying all MRCP attributes in one go. I believe that there's a rational reason for users to want to be able to specify each.

Just to confirm if we're talking about the same list, I assume that that would mean all of the MRCPv2 header fields with a "Resource type" of "Recognizer", which would be all of these:

Sensitivity-Level
Speed-Vs-Accuracy
N-Best-List-Length
Input-Type
No-Input-Timeout
Recognition-Timeout
Waveform-URI
Input-Waveform-URI
Completion-Cause
Completion-Reason
Recognizer-Context-Block
Start-Input-Timers
Speech-Complete-Timeout
Speech-Incomplete-Timeout
Dtmf-Interdigit-Timeout
Dtmf-Term-Timeout
Dtmf-Term-Char
Failed-URI
Failed-URI-Cause
Save-Waveform
Media-Type
New-Audio-Channel
Speech-Language
Ver-Buffer-Utterance
Recognition-Mode
Cancel-If-Queue
Hotword-Max-Duration
Hotword-Min-Duration
Interpret-Text
Dtmf-Buffer-Time
Clear-Dtmf-Buffer
Early-No-Match
Num-Min-Consistent-Pronunciations
Consistency-Threshold
Clash-Threshold
Personal-Grammar-URI
Enroll-Utterance
Phrase-ID
Phrase-NL
Weight
Save-Best-Waveform
New-Phrase-ID
Confusable-Phrases-URI
Abort-Phrase-Enrollment

sfgeorge commented 9 years ago

And under Output, it may be useful to include the fields with a Resource type of "Synthesizer":

Jump-Size
Kill-On-Barge-In
Speaker-Profile
Completion-Cause
Completion-Reason
Voice-Parameter
Prosody-Parameter
Speech-Marker
Speech-Language
Fetch-Hint
Audio-Fetch-Hint
Failed-URI
Failed-URI-Cause
Speak-Restart
Speak-Length
Load-Lexicon
Lexicon-Search-Order

bklang commented 9 years ago

Thanks for this. What I'm wondering is:

Is it worth us doing this piecemeal, or should we specify all MRCP attributes in Rayo in one go? What is the place of Rayo's media components in relation to MRCP? It's supposed to be an abstraction, but where do we define the cut-off point for importing complexity? Thoughts @bklang, @jsgoecke, @mpermar?

I'd rather get everything in one go, rather than have a partial solution.

Is there a way to reference the MRCP params without copying them? Should we define a generic parameter attribute (key/value) that the Rayo server should pass on to the backend? I can't think of any competitor to MRCP that would make sense to substitute here, but it's not an area of my expertise.

I will say that significant advanced functionality is impossible without being able to set some of these parameters - so there's good reason to find a way to support this.

benlangfeld commented 9 years ago

If we were to make this generic, how would we rationalise the attributes that already exist as an exact parallel to what's in MRCP? Which would take preference? It feels kinda silly to have two ways to set the same value, but we're stuck with the attributes we already have for BC reasons.

I also cannot think of a desirable MRCP competitor, but the question is whether Rayo should tunnel MRCP or abstract it. VoiceXML abstracts it, and there is not a way to set arbitrary MRCP parameters like these that I know of.

bklang commented 9 years ago

I'd be interested to know VXML's reasons for abstracting it. These parameters, to me, don't feel too low-level, as long as you aren't required to set most of them (and you're not, they all come with defaults). Some of these parameters have a lot of meaning for an application, such as hotword vs. normal mode and several of the input timeout params.

Unless someone can give me a good reason why abstracting the params adds some kind of value, I'd be in favor of simply exposing the MRCP params upward.

benlangfeld commented 9 years ago

@crienzo Do you have any thoughts on this?

crienzo commented 9 years ago

A lot of the output params are already abstracted by SSML and are probably not necessary to expose.

Allowing input params to be passed to the underlying recognizer seems ok. I think the generic parameter approach is best. Issues with generic params that are not supported or conflict with defined params are not a big deal- we can define how to handle each situation (e.g. let generic win, ignore unknown params, reply w/ bad-request, etc).

bklang commented 9 years ago

I suspect our suggested resolution to this would also satisfy #91?

benlangfeld commented 9 years ago

I suspect our suggested resolution to this would also satisfy #91?

Correct

rayo / xmpp

add dtmf-term-timeout and speech-complete-timeout #96