microsoft / BotBuilder-Samples

Welcome to the Bot Framework samples repository. Here you will find task-focused samples in C#, JavaScript/TypeScript, and Python to help you get started with the Bot Framework SDK!
https://github.com/Microsoft/botframework
MIT License
4.39k stars 4.88k forks source link

Direct line speech - Echo bot not using final formatted recognition result #3871

Closed jens-f closed 7 months ago

jens-f commented 1 year ago

Sample information

  1. Sample type: \samples\
  2. Sample language: typescript
  3. Sample name: 02.echo-bot

Describe the bug

When using Direct Line Speech, the echo bot does not echo back the final formatted speech recognition result. When the speech recognition result in the client app shows e.g. "Computer, what's the current time?", the echo response from the bot is "Echo: computer what's the current time". I.e. all the formatting and additional punctuation is gone and it looks like it's using the last raw output from the speech recognition instead of the "final formatted" result.

I'm not sure if this is an issue with the Echo bot or the Direct Line Speech channel. Is there a configuration setting for direct line speech to modify this behaviour?

To Reproduce

Steps to reproduce the behavior:

  1. Download the latest Windows Voice Assistant Client" from https://github.com/Azure-Samples/Cognitive-Services-Voice-Assistant/releases
  2. When starting the client, enter subscription key and region
  3. For the keyword model, use any from https://github.com/Azure-Samples/Cognitive-Services-Voice-Assistant/tree/main/keyword-models. I used computer.table
  4. Start the Echo Bot from https://github.com/microsoft/BotBuilder-Samples/tree/main/samples/typescript_nodejs/02.echo-bot
  5. In the voice assistant client, say (using your microphone) "computer what's the current time"
  6. The transcribed text in the chat view will show "Computer, what's the current time?"
  7. But the echo response will show "Echo: computer what's the current time" (i.e. all punctuation and formatting is missing)

Expected behavior

The echo bot should have responded "Echo: Computer, what's the current time?"

Additional context

I'm able to reproduce the same exact behaviour with a different, C++ based client.

stevkan commented 1 year ago

@jens-totemic - Thank you for your patience. I successfully repro'd the issue you are experiencing. Additionally, I checked and verified that the returned text value appeared to be coming from the DL Speech service.

I spoke with one of the primary Web Chat devs and confirmed that Web Chat does not perform any processing on the spoken words nor on the returned text result. It is the DL Speech service, acting as a proxy between the bot and Web Chat, that is processing the wave form file generated in Web Chat. It is also responsible for the returned text result. From the docs on inverse text normalization, or ITN, the DL Speech service should be returning a properly formatted response.

I'm going to check to see if I have a contact in the DL Speech team that I can get this in front if, directly. I will keep you posted. Otherwise, we'll just have to transfer the issue.

stevkan commented 7 months ago

@jens-totemic - I don't know if this is still an issue for you. If you were able to get around the problem, great. If not, and you haven't already done so, please open a new issue on the Cognitive Services Speech SDK repo.