Hi,
We have found some scenarios when using speakSsmlAsync where we receive a reason of ResultReason.SynthesizingAudioCompleted (10) but an audioDuration of 0 and no errorDetails. Based on the documentation we would have expected to receive a reason code of ResultReason.NoMatch (0) or to have received errorDetails if the text could not be processed. We would like guidance on what type of scenarios we might expect to get 0 audioDuration with a response result that is ResultReason.SynthesizingAudioCompleted so that we can handle in our code accordingly.
One of the scenarios we found where this happens is if the text sent is simply punctuation marks (which made sense), but we found scenarios where text was sent and the same result happened which is why we are looking for guidance on when we should expect this type of response so our code handling can be updated accordingly.
We are reliably able to reproduce this scenario as below when converting Japanese text; when we submitted the equivalent of the Japanese word “test” 49 times it produced an audio result but if we submitted the equivalent of the Japanese word “test” 50 times it produced audioDuration 0. Obviously, this is just a test data scenario, but what we are looking for is guidance on known scenarios to expect a audioDuration result of 0 since we weren’t able to understand why it might work for 49 repeats but not 50 repeats of the same text.
In both cases we are working out of region uswest2 and voice of ja-JP-NanamiNeural:
For テストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテスト (repeats テスト 49 times) we get:
result: SpeechSynthesisResult {
privResultId: '4BF7C5A4B23C4C6897989758D562E37C',
privReason: 10,
privErrorDetails: undefined,
privProperties: undefined,
privAudioData: ArrayBuffer {
[Uint8Contents]: <52 49 46 46 f4 b8 08 00 57 41 56 45 66 6d 74 20 10 00 00 00 01 00 01 00 80 3e 00 00 00 7d 00 00 02 00 10 00 64 61 74 61 d0 b8 08 00 fe ff fd ff fd ff fc ff fd ff fd ff fc ff fd ff fd ff fd ff fd ff fc ff fc ff fc ff fc ff fb ff fc ff fc ff fc ff fc ff fc ff fc ff fc ff fc ff fd ff fc ff fd ff fc ff ... 571544 more bytes>,
byteLength: 571644
},
privAudioDuration: 178625000
}
For テストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテスト (repeats テスト 50 times) we get:
Note that we are on version 1.36.0. Your latest releases indicate latest is 1.35.0 and the latest version here that I am able to choose when filing this bug report is 1.34so that's what I selected, but on npm the latest is 1.36.0 published 20 days ago which is what we are on.
Version
1.34.0 (Latest)
What browser/platform are you seeing the problem on?
What happened?
Hi, We have found some scenarios when using
speakSsmlAsync
where we receive areason
ofResultReason.SynthesizingAudioCompleted
(10) but anaudioDuration
of 0 and noerrorDetails
. Based on the documentation we would have expected to receive a reason code ofResultReason.NoMatch
(0) or to have receivederrorDetails
if the text could not be processed. We would like guidance on what type of scenarios we might expect to get 0audioDuration
with a response result that isResultReason.SynthesizingAudioCompleted
so that we can handle in our code accordingly.One of the scenarios we found where this happens is if the text sent is simply punctuation marks (which made sense), but we found scenarios where text was sent and the same result happened which is why we are looking for guidance on when we should expect this type of response so our code handling can be updated accordingly.
We are reliably able to reproduce this scenario as below when converting Japanese text; when we submitted the equivalent of the Japanese word “test” 49 times it produced an audio result but if we submitted the equivalent of the Japanese word “test” 50 times it produced audioDuration 0. Obviously, this is just a test data scenario, but what we are looking for is guidance on known scenarios to expect a audioDuration result of 0 since we weren’t able to understand why it might work for 49 repeats but not 50 repeats of the same text.
In both cases we are working out of region
uswest2
and voice ofja-JP-NanamiNeural
:For
テストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテスト
(repeatsテスト
49 times) we get:For
テストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテスト
(repeatsテスト
50 times) we get:Note that we are on version
1.36.0
. Your latest releases indicate latest is1.35.0
and the latest version here that I am able to choose when filing this bug report is1.34
so that's what I selected, but on npm the latest is1.36.0
published 20 days ago which is what we are on.Version
1.34.0 (Latest)
What browser/platform are you seeing the problem on?
Node
Relevant log output
No response