Open vpetit-reimagine opened 10 months ago
+1
Which combination of browser and environments have you tried this code? Chrome/Edge/FireFox (PC), Chrome/Safari/FireFox (Mac), Chrome (Android), Safari/Chrome (iOS) and so on.
We tested it on all the different possible combinations. The reason why it didn't work as is on Safari/iPhone was linked to how the audio is created/played on that environment (cannot be autoplayed, and doesn't use the same audio API than the other browsers).
We had to find another way to make it work in our use case as this was not possible to rely on the example
This is because Safari/iPhone(iOS) can not handle MSE API (Media Source Extensions API). The SDK default is set to use MSE. The alternative way would be to use Web Audio API. WebAudio availability: https://caniuse.com/audio-api MSE availability: https://caniuse.com/mediasource
import * as sdk from 'microsoft-cognitiveservices-speech-sdk';
// WebAudio API
let audioContext = new AudioContext();
let bufferSource = audioContext.createBufferSource();
// Setup Speech SDK
const speechConfig = sdk.SpeechConfig.fromSubscription("******************", "******");
let audioStream = sdk.PullAudioOutputStream.create();
const audioConfig = sdk.AudioConfig.fromStreamOutput(audioStream);
let synthesizer = new sdk.SpeechSynthesizer(speechConfig, audioConfig);
// Text input
let text = 'Lorem ipsum dolor sit amet ..... ';
synthesizer.speakTextAsync(text, result => {
audioContext.decodeAudioData(result.audioData, buffer => {
bufferSource.buffer = buffer;
bufferSource.connect(audioContext.destination);
bufferSource.start(0);
});
}, error => {
// Some error process
});
This is because Safari/iPhone(iOS) can not handle MSE API (Media Source Extensions API). The SDK default is set to use MSE. The alternative way would be to use Web Audio API. WebAudio availability: https://caniuse.com/audio-api MSE availability: https://caniuse.com/mediasource
import * as sdk from 'microsoft-cognitiveservices-speech-sdk'; // WebAudio API let audioContext = new AudioContext(); let bufferSource = audioContext.createBufferSource(); // Setup Speech SDK const speechConfig = sdk.SpeechConfig.fromSubscription("******************", "******"); let audioStream = sdk.PullAudioOutputStream.create(); const audioConfig = sdk.AudioConfig.fromStreamOutput(audioStream); let synthesizer = new sdk.SpeechSynthesizer(speechConfig, audioConfig); // Text input let text = 'Lorem ipsum dolor sit amet ..... '; synthesizer.speakTextAsync(text, result => { audioContext.decodeAudioData(result.audioData, buffer => { bufferSource.buffer = buffer; bufferSource.connect(audioContext.destination); bufferSource.start(0); }); }, error => { // Some error process });
Thank you this worked but I ran into an issue where the audio is very low when the phone microphone is also in use, this happens only in IOS.
This is because Safari/iPhone(iOS) can not handle MSE API (Media Source Extensions API). The SDK default is set to use MSE. The alternative way would be to use Web Audio API. WebAudio availability: https://caniuse.com/audio-api MSE availability: https://caniuse.com/mediasource
import * as sdk from 'microsoft-cognitiveservices-speech-sdk'; // WebAudio API let audioContext = new AudioContext(); let bufferSource = audioContext.createBufferSource(); // Setup Speech SDK const speechConfig = sdk.SpeechConfig.fromSubscription("******************", "******"); let audioStream = sdk.PullAudioOutputStream.create(); const audioConfig = sdk.AudioConfig.fromStreamOutput(audioStream); let synthesizer = new sdk.SpeechSynthesizer(speechConfig, audioConfig); // Text input let text = 'Lorem ipsum dolor sit amet ..... '; synthesizer.speakTextAsync(text, result => { audioContext.decodeAudioData(result.audioData, buffer => { bufferSource.buffer = buffer; bufferSource.connect(audioContext.destination); bufferSource.start(0); }); }, error => { // Some error process });
Thank you this worked but I ran into an issue where the audio is very low when the phone microphone is also in use, this happens only in IOS.
Have you been able to resolve that?
What happened?
Hi, in a current project we are trying to do, we use the latest cognitive-services-speech-sdk library to transform some text to audible audio.
Context:
We have a div for which we append text as it arrives from a streamed data source (coming from a backend server), to which the delay is independent of us. The goal is to use the Microsoft Cognitive Speech SDK to transform that text to speech as it arrives (in fact, the audio should be queued and processed sequentially until the text is completely processed).
The div is observed by a Mutation Observer, calling the synthesizer.speakTextAsync() method on the newly provided text.
Behaviour: On Chromium and Edge browsers, using the synthesizer without closing it/disposing once the audio is completed works as it will be used automatically by the observer, producing the expected result (the audio is processed sequentially until the text is fully processed).
However, on Safari and Firefox, the audio is not provided to the browser at all as the synthesizer isn't closed (this is probably expected as it is impossible for those browsers to process streamed audio directly). However, as mentionned above, we have to use a single synthesizer as it is required that the audio is processed sequentially (using multiple synthesizer will process the text in parallel, creating audio output in parallel as well). We thought about using an array where we append the audio as it is completed by the synthesizer, without any success.
The synthesizer SHOULD remain open/available until we decide that it can be disposed (basically when the end user leaves the current page), as the text the user on the page receives is unknown upfront. We can't create/open new synthesizers in parallel as they would generate the audio output at the same time and it not be what we expect.
Do you have any suggestion on how we could fix this? Or would it be possible to tell the synthesizer to "flush" the current
result.audioData
to the browser without closing it?(Attached, you will find the modified sample file that mimics the behavior we want to achieve. If you test it on Chrome/Edge, it will work perfectly fine, but as soon as you test it on Firefox/iOS(Safari), the audio does not play).
Best regards, Vincent. sdk-test.zip
Version
1.33.0 (Latest)
What browser/platform are you seeing the problem on?
Firefox, Safari
Relevant log output
No response