microsoft / BotFramework-WebChat

A highly-customizable web-based client for Azure Bot Services.
https://www.botframework.com/
MIT License
1.58k stars 1.53k forks source link

Synchronized Voice and Text via Webchat #3159

Closed Amintasn closed 4 years ago

Amintasn commented 4 years ago

Synchronized Voice and Text via Webchat

I'm using Direct Line, MS Speech Service and Webchat on bot running on Botframework V4. The application is fully voice based and is working fine. However, the text and voice motion is not in sync. In other words, after the first text+voice interaction, the next text is displayed before the previous speech service has been finished - so on and so forth. Hence, voice is always delayed in relation to the text motion. I was wondering if webchat.js would be able to automatically orchestrate that or there is another way to accomplish it. Thx in advance.

[Enhancement]

compulim commented 4 years ago

Do you mean?

  1. Bot send a response with speech > 5s
  2. After 2s, the bot send another response with speech

As the speech of first response took > 5s to complete, the text of the second response will arrive sooner (after ~2s). And the second response show on the screen as soon as it arrive.

Can you confirm my explanation above match your description?

Amintasn commented 4 years ago

Thx for you usual support @compulim We have variable speech times actually. Some are >5s some are <5s. The point is text messages keep coming before the speech service has finished speaking which generates out-of-sync motion. Hope it helped to clarify your question.

Amintasn commented 4 years ago

Hi @compulim, just wondering if you have any update or advice about this case. Thx!

a-b-r-o-w-n commented 4 years ago

@compulim do you have any guidance on this?

compulim commented 4 years ago

@Amintasn sorry for late response.

This is by design. Because, the speech engine should not interrupt the speech until they are fully spoken. Otherwise, a bot sending 2 responses rapidly (< 1s of each other), the first response will be missing from synthesis.

This design also align with screen reader live region (aria-live="polite"), which don't interrupt but add the sentence to the queue.

Could you use the "speak" property to shorten the spoken sentence while displaying a longer version?