[Question]: Browser Unable to Decode and Play Partial Speech due to Missing Header Information

rohit1coding commented 2 months ago

What happened?

Issue Description: I am currently working on a project where real-time audio playback is required while text-to-speech conversion is still in progress. To achieve lower latency, I am attempting to play partial speech segments as they become available instead of waiting for the complete text-to-speech data.

Problem: The issue arises with the partial audio buffers: these buffers are raw and lack the necessary header information that browsers require for decoding and playback. Consequently, while complete speech data from the speakSsmlAsync method works correctly, partial speech data does not function as expected due to this missing header.

Code Implementation: Backend Code: const pushStream = SpeechSdk.AudioOutputStream.createPullStream(); const audioConfig = SpeechSdk.AudioConfig.fromStreamOutput(pushStream); synthesizer = new SpeechSdk.SpeechSynthesizer(speechConfig, audioConfig); pushStream.write = (audioData) => { playAudio(audioData); };

Frontend Code: const playAudio = async (audioData) => { const audioDataBufferArray = Uint8Array.from(audioData).buffer; try { const decodedAudioBuffer = await audioContext.decodeAudioData(audioDataBufferArray); } catch (error) { console.error('Error decoding audio data:', error); } };

Expected Behavior: The browser should be able to decode and play partial speech segments without any issues.

Current Behavior: The browser fails to decode the partial audio data due to the absence of header information, leading to errors and inability to play the speech segments.

Steps to Reproduce: Initiate the text-to-speech conversion process. Attempt to play audio as it is being synthesized. Observe that while complete audio data plays without issues, partial segments fail to decode and play.

Potential Solutions: A possible approach to resolve this issue could involve dynamically adding the necessary header information to the partial buffers before attempting playback, or implementing a method to handle raw audio data more effectively in the browser.

This issue significantly affects the usability of real-time audio features in our application, and any guidance or solutions would be greatly appreciated.

Version

1.36.0 (Latest)

What browser/platform are you seeing the problem on?

Chrome

Relevant log output

No response

Version

1.36.0 (Latest)

What browser/platform are you seeing the problem on?

No response

Relevant log output

No response

yulin-li commented 1 month ago

Thanks for using azure speech and open this issue.

The data in stream is designed to have no headers, you can use Synthesizing event to get partial data with header.

rohit1coding commented 1 month ago

Thanks, @yulin-li, It's working perfectly now.

I have a last question regarding the Azure Text-to-Speech service: If I continue to pass data for text-to-speech conversion sequentially, will the responses from the synthesizing event be received in the same order?

yulin-li commented 1 month ago

yes, the order is guaranteed

yulin-li commented 1 month ago

I am closing this issue as question is answered, feel free to re-open if you have further questions.

microsoft / cognitive-services-speech-sdk-js