microsoft / cognitive-services-speech-sdk-js

Microsoft Azure Cognitive Services Speech SDK for JavaScript
Other
255 stars 92 forks source link

synthesizer.speakSsmlAsync fails with mstts tags only #528

Open gad2103 opened 2 years ago

gad2103 commented 2 years ago

When I use the sdk to generate speech it works fine with the following ssml:

let ssmlThatWorks = "<speak version=\"1.0\" xmlns=\"https://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\">\n" +
        "  <voice name=\"en-US-JennyNeural\">\n" +
        toContentString(parsedXml) + " \n" +
        "  </voice>\n" +
        "</speak>"

however, if i use the variant that includes the mstts tags i get the following, Error Message

SpeechSynthesisResult {
  privResultId: '147A8275E1E34BE3AD49F7892846A194',
  privReason: 1,
  privErrorDetails: "Unexpected TextToSpeech.Protocols.Universal.Messages.AudioMetadataResponseMessage' message for Reque websocket error code: 1002",
  privProperties: PropertyCollection {
    privKeys: [ 'CancellationErrorCode' ],
    privValues: [ 'ConnectionFailure' ]
  },
  privAudioData: undefined,
  privAudioDuration: undefined
}

the ssml that reproduces that error consistently looks like, ( replacing the bracketed content with any public audio file)

let ssmlThatProducesErrors = "<speak version=\"1.0\" xmlns:mstts=\"http://www.w3.org/2001/mstts\" xml:lang=\"en-US\">\n" +
            "<mstts:backgroundaudio src=\"https://[PUT PUBLIC AUDIO FILE HERE TO REPRODUCE].mp3\" volume=\"0.7\" fadein=\"0\" fadeout=\"0\" />  <voice name=\"en-US-JennyNeural\">\n" +
            "Hello  \n" +
            "  </voice>\n" +
            "</speak>"

If I test the bad ssml in the browser on the official azure tts site, everything is generated correctly...

I would love to be able to use background music in my application!

Other things i tried:

Any help would be greatly appreciated! Thanks in advance.

glharper commented 2 years ago

@yulin-li Is there a service contact we can pass this to?

yulin-li commented 2 years ago

Hi @gad2103, I still cannot repro your error, could you share the resultId with us? We can check at service side.

gad2103 commented 2 years ago

@yulin-li can you share how you're trying to repro? is the result id not in the original error message i posted ☝️

SpeechSynthesisResult {
  privResultId: '147A8275E1E34BE3AD49F7892846A194',
  privReason: 1,
  privErrorDetails: "Unexpected TextToSpeech.Protocols.Universal.Messages.AudioMetadataResponseMessage' message for Reque websocket error code: 1002",
  privProperties: PropertyCollection {
    privKeys: [ 'CancellationErrorCode' ],
    privValues: [ 'ConnectionFailure' ]
  },
  privAudioData: undefined,
  privAudioDuration: undefined
}

if no, where do i find the correct id?

gad2103 commented 2 years ago

looks possibly related https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/1492

johnmalatras commented 2 years ago

I'm also seeing issues with background audio (that is my issue @gad2103 linked above). I've spent several hours debugging and have yet to get it to work - unfortunately this is necessary for our use case

yulin-li commented 2 years ago

Hi @gad2103, sorry for missing the result id in your error message.

I can repro the issue now, if I set the audio output format to Raw8Khz8BitMonoMULaw as you set. I report this issue to service guys and they will take a look.

As a workaround, could you try to use formats other than 8kHz ones?

johnmalatras commented 2 years ago

For what it's worth I'm also seeing the issue with audio-48khz-96kbitrate-mono-mp3

gad2103 commented 2 years ago

Hi @gad2103, sorry for missing the result id in your error message.

I can repro the issue now, if I set the audio output format to Raw8Khz8BitMonoMULaw as you set. I report this issue to service guys and they will take a look.

As a workaround, could you try to use formats other than 8kHz ones?

@yulin-li i can try to see if that resolves the error, however that's the audio format i need for my application.

yulin-li commented 2 years ago

I understand, the service guys are working on this bug

gad2103 commented 2 years ago

I understand, the service guys are working on this bug

just checking in on the status here

johnmalatras commented 2 years ago

Also wanting to follow up on this. To add another data point - long form synthesis fails entirely when I include the background audio tag.

ciaran-parloa commented 10 months ago

We are also affected by this issue, any updates @yulin-li ?

sebvieux commented 3 months ago

Hi, any updates on this issue ?