microsoft / cognitive-services-speech-sdk-js

Microsoft Azure Cognitive Services Speech SDK for JavaScript
Other
252 stars 91 forks source link

[Bug]: Browser Unable to Decode and Play Partial Speech Segments due to Missing Header Information #823

Closed rohit1coding closed 1 week ago

rohit1coding commented 1 month ago

What happened?

Issue Description: I am currently working on a project where real-time audio playback is required while text-to-speech conversion is still in progress. To achieve lower latency, I am attempting to play partial speech segments as they become available instead of waiting for the complete text-to-speech data.

Problem: The issue arises with the partial audio buffers: these buffers are raw and lack the necessary header information that browsers require for decoding and playback. Consequently, while complete speech data from the speakSsmlAsync method works correctly, partial speech data does not function as expected due to this missing header.

Code Implementation: Backend Code: const pushStream = SpeechSdk.AudioOutputStream.createPullStream(); const audioConfig = SpeechSdk.AudioConfig.fromStreamOutput(pushStream); synthesizer = new SpeechSdk.SpeechSynthesizer(speechConfig, audioConfig); pushStream.write = (audioData) => { playAudio(audioData); };

Frontend Code: const playAudio = async (audioData) => { const audioDataBufferArray = Uint8Array.from(audioData).buffer; try { const decodedAudioBuffer = await audioContext.decodeAudioData(audioDataBufferArray); } catch (error) { console.error('Error decoding audio data:', error); } };

Expected Behavior: The browser should be able to decode and play partial speech segments without any issues.

Current Behavior: The browser fails to decode the partial audio data due to the absence of header information, leading to errors and inability to play the speech segments.

Steps to Reproduce: Initiate the text-to-speech conversion process. Attempt to play audio as it is being synthesized. Observe that while complete audio data plays without issues, partial segments fail to decode and play.

Potential Solutions: A possible approach to resolve this issue could involve dynamically adding the necessary header information to the partial buffers before attempting playback, or implementing a method to handle raw audio data more effectively in the browser.

This issue significantly affects the usability of real-time audio features in our application, and any guidance or solutions would be greatly appreciated.

Version

1.36.0 (Latest)

What browser/platform are you seeing the problem on?

Chrome

Relevant log output

No response

glharper commented 1 month ago

@rohit1coding Thank you for using JS Speech SDK, and writing this issue up. About how many bytes are these partial speech segments you're wanting to decode? There is code in the JS Speech SDK for creating a wav header here, if you'd like to reuse it in your own code to prepend to the audio stream before writing to the pushStream.