microsoft / cognitive-services-speech-sdk-js

Microsoft Azure Cognitive Services Speech SDK for JavaScript
Other
252 stars 91 forks source link

[Bug]: speakSsmlAsync produces 0 duration audio but result reason is SynthesizingAudioCompleted #805

Closed sbhvt closed 3 months ago

sbhvt commented 3 months ago

What happened?

Hi, We have found some scenarios when using speakSsmlAsync where we receive a reason of ResultReason.SynthesizingAudioCompleted (10) but an audioDuration of 0 and no errorDetails. Based on the documentation we would have expected to receive a reason code of ResultReason.NoMatch (0) or to have received errorDetails if the text could not be processed. We would like guidance on what type of scenarios we might expect to get 0 audioDuration with a response result that is ResultReason.SynthesizingAudioCompleted so that we can handle in our code accordingly.

One of the scenarios we found where this happens is if the text sent is simply punctuation marks (which made sense), but we found scenarios where text was sent and the same result happened which is why we are looking for guidance on when we should expect this type of response so our code handling can be updated accordingly.

We are reliably able to reproduce this scenario as below when converting Japanese text; when we submitted the equivalent of the Japanese word “test” 49 times it produced an audio result but if we submitted the equivalent of the Japanese word “test” 50 times it produced audioDuration 0. Obviously, this is just a test data scenario, but what we are looking for is guidance on known scenarios to expect a audioDuration result of 0 since we weren’t able to understand why it might work for 49 repeats but not 50 repeats of the same text.

In both cases we are working out of region uswest2 and voice of ja-JP-NanamiNeural:

For テストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテスト (repeats テスト 49 times) we get:

result: SpeechSynthesisResult {
        privResultId: '4BF7C5A4B23C4C6897989758D562E37C',
        privReason: 10,
        privErrorDetails: undefined,
        privProperties: undefined,
        privAudioData: ArrayBuffer {
          [Uint8Contents]: <52 49 46 46 f4 b8 08 00 57 41 56 45 66 6d 74 20 10 00 00 00 01 00 01 00 80 3e 00 00 00 7d 00 00 02 00 10 00 64 61 74 61 d0 b8 08 00 fe ff fd ff fd ff fc ff fd ff fd ff fc ff fd ff fd ff fd ff fd ff fc ff fc ff fc ff fc ff fb ff fc ff fc ff fc ff fc ff fc ff fc ff fc ff fc ff fd ff fc ff fd ff fc ff ... 571544 more bytes>,
          byteLength: 571644
        },
        privAudioDuration: 178625000
      }

For テストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテスト (repeats テスト 50 times) we get:

result: SpeechSynthesisResult {
        privResultId: '4BC6BDA9AD0346FBA5FA5DF09E078335',
        privReason: 10,
        privErrorDetails: undefined,
        privProperties: undefined,
        privAudioData: ArrayBuffer {
          [Uint8Contents]: <52 49 46 46 24 00 00 00 57 41 56 45 66 6d 74 20 10 00 00 00 01 00 01 00 80 3e 00 00 00 7d 00 00 02 00 10 00 64 61 74 61 00 00 00 00>,
          byteLength: 44
        },
        privAudioDuration: 0
      }

Note that we are on version 1.36.0. Your latest releases indicate latest is 1.35.0 and the latest version here that I am able to choose when filing this bug report is 1.34so that's what I selected, but on npm the latest is 1.36.0 published 20 days ago which is what we are on.

Version

1.34.0 (Latest)

What browser/platform are you seeing the problem on?

Node

Relevant log output

No response

yulin-li commented 3 months ago

This is a known issue. The default voice cannot speak Korean characters. Please change the voice.