microsoft / cognitive-services-speech-sdk-js

Microsoft Azure Cognitive Services Speech SDK for JavaScript
Other
263 stars 97 forks source link

[Bug]: synthesizer.speakTextAsync not working properly with Firefox or on iOS (Safari) with continuous text input #767

Open vpetit-reimagine opened 10 months ago

vpetit-reimagine commented 10 months ago

What happened?

Hi, in a current project we are trying to do, we use the latest cognitive-services-speech-sdk library to transform some text to audible audio.

Context:

Behaviour: On Chromium and Edge browsers, using the synthesizer without closing it/disposing once the audio is completed works as it will be used automatically by the observer, producing the expected result (the audio is processed sequentially until the text is fully processed).

However, on Safari and Firefox, the audio is not provided to the browser at all as the synthesizer isn't closed (this is probably expected as it is impossible for those browsers to process streamed audio directly). However, as mentionned above, we have to use a single synthesizer as it is required that the audio is processed sequentially (using multiple synthesizer will process the text in parallel, creating audio output in parallel as well). We thought about using an array where we append the audio as it is completed by the synthesizer, without any success.

The synthesizer SHOULD remain open/available until we decide that it can be disposed (basically when the end user leaves the current page), as the text the user on the page receives is unknown upfront. We can't create/open new synthesizers in parallel as they would generate the audio output at the same time and it not be what we expect.

Do you have any suggestion on how we could fix this? Or would it be possible to tell the synthesizer to "flush" the current result.audioData to the browser without closing it?

(Attached, you will find the modified sample file that mimics the behavior we want to achieve. If you test it on Chrome/Edge, it will work perfectly fine, but as soon as you test it on Firefox/iOS(Safari), the audio does not play).

Best regards, Vincent. sdk-test.zip

Version

1.33.0 (Latest)

What browser/platform are you seeing the problem on?

Firefox, Safari

Relevant log output

No response

1014156094 commented 10 months ago

+1

k-daimon commented 7 months ago

Which combination of browser and environments have you tried this code? Chrome/Edge/FireFox (PC), Chrome/Safari/FireFox (Mac), Chrome (Android), Safari/Chrome (iOS) and so on.

vpetit-reimagine commented 7 months ago

We tested it on all the different possible combinations. The reason why it didn't work as is on Safari/iPhone was linked to how the audio is created/played on that environment (cannot be autoplayed, and doesn't use the same audio API than the other browsers).

We had to find another way to make it work in our use case as this was not possible to rely on the example

k-daimon commented 7 months ago

This is because Safari/iPhone(iOS) can not handle MSE API (Media Source Extensions API). The SDK default is set to use MSE. The alternative way would be to use Web Audio API. WebAudio availability: https://caniuse.com/audio-api MSE availability: https://caniuse.com/mediasource

import * as sdk from 'microsoft-cognitiveservices-speech-sdk';

// WebAudio API
let audioContext = new AudioContext();
let bufferSource = audioContext.createBufferSource();

// Setup Speech SDK
const speechConfig = sdk.SpeechConfig.fromSubscription("******************", "******");
let audioStream = sdk.PullAudioOutputStream.create();
const audioConfig = sdk.AudioConfig.fromStreamOutput(audioStream);
let synthesizer = new sdk.SpeechSynthesizer(speechConfig, audioConfig);

// Text input
let text = 'Lorem ipsum dolor sit amet ..... ';

synthesizer.speakTextAsync(text, result => {
  audioContext.decodeAudioData(result.audioData, buffer => {
    bufferSource.buffer = buffer;
    bufferSource.connect(audioContext.destination);
    bufferSource.start(0);
  });
}, error => {
  // Some error process
});
nosisky commented 6 months ago

This is because Safari/iPhone(iOS) can not handle MSE API (Media Source Extensions API). The SDK default is set to use MSE. The alternative way would be to use Web Audio API. WebAudio availability: https://caniuse.com/audio-api MSE availability: https://caniuse.com/mediasource

import * as sdk from 'microsoft-cognitiveservices-speech-sdk';

// WebAudio API
let audioContext = new AudioContext();
let bufferSource = audioContext.createBufferSource();

// Setup Speech SDK
const speechConfig = sdk.SpeechConfig.fromSubscription("******************", "******");
let audioStream = sdk.PullAudioOutputStream.create();
const audioConfig = sdk.AudioConfig.fromStreamOutput(audioStream);
let synthesizer = new sdk.SpeechSynthesizer(speechConfig, audioConfig);

// Text input
let text = 'Lorem ipsum dolor sit amet ..... ';

synthesizer.speakTextAsync(text, result => {
  audioContext.decodeAudioData(result.audioData, buffer => {
    bufferSource.buffer = buffer;
    bufferSource.connect(audioContext.destination);
    bufferSource.start(0);
  });
}, error => {
  // Some error process
});

Thank you this worked but I ran into an issue where the audio is very low when the phone microphone is also in use, this happens only in IOS.

localhostd3veloper commented 4 months ago

This is because Safari/iPhone(iOS) can not handle MSE API (Media Source Extensions API). The SDK default is set to use MSE. The alternative way would be to use Web Audio API. WebAudio availability: https://caniuse.com/audio-api MSE availability: https://caniuse.com/mediasource

import * as sdk from 'microsoft-cognitiveservices-speech-sdk';

// WebAudio API
let audioContext = new AudioContext();
let bufferSource = audioContext.createBufferSource();

// Setup Speech SDK
const speechConfig = sdk.SpeechConfig.fromSubscription("******************", "******");
let audioStream = sdk.PullAudioOutputStream.create();
const audioConfig = sdk.AudioConfig.fromStreamOutput(audioStream);
let synthesizer = new sdk.SpeechSynthesizer(speechConfig, audioConfig);

// Text input
let text = 'Lorem ipsum dolor sit amet ..... ';

synthesizer.speakTextAsync(text, result => {
  audioContext.decodeAudioData(result.audioData, buffer => {
    bufferSource.buffer = buffer;
    bufferSource.connect(audioContext.destination);
    bufferSource.start(0);
  });
}, error => {
  // Some error process
});

Thank you this worked but I ran into an issue where the audio is very low when the phone microphone is also in use, this happens only in IOS.

Have you been able to resolve that?