ricky0123 / vad

Voice activity detector (VAD) for the browser with a simple API
https://www.vad.ricky0123.com
Other
754 stars 116 forks source link

myvad.start() on iPhone, the speaker output volume is cut almost by half automatically #96

Open joeleegithub opened 4 months ago

joeleegithub commented 4 months ago

It also affects the volume on other opened browser programs that are currently streaming music and voice. It only happens on iPhone, not on Android and PCs. Once you stops the myvad() the speaker volume goes back to normal, for the other opened browser programs too.

This also occurs when you run Ricky's vad demo on a iPhone: https://www.vad.ricky0123.com/

raylin01 commented 3 months ago

I believe this is because iOS treats mic + speaker output as a call, or a different type of media, so its not according to your set volume (from my testing, you can't change the volume either). Still trying to figure out if there's anyway around this, but likely it is an iOS behavior.

joeleegithub commented 3 months ago

Hi Ray:

Yes, it seems to be that way. Thank you for the reply.

Joe Lee

Sign-A-Rama

416-783-5472

From: Ray Lin @.> Sent: Thursday, May 16, 2024 3:11 PM To: ricky0123/vad @.> Cc: Joseph Lee @.>; Author @.> Subject: Re: [ricky0123/vad] myvad.start() on iPhone, the speaker output volume is cut almost by half automatically (Issue #96)

I believe this is because iOS treats mic + speaker output as a call, or a different type of media, so its not according to your set volume (from my testing, you can't change the volume either). Still trying to figure out if there's anyway around this, but likely it is an iOS behavior.

— Reply to this email directly, view it on GitHub https://github.com/ricky0123/vad/issues/96#issuecomment-2116002368 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AQPVOWFSUR4BJXCL7MNMMDLZCUAEDAVCNFSM6AAAAABGSVNAAGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJWGAYDEMZWHA . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AQPVOWAMKTZTMQ3JE6IVRNLZCUAEDA5CNFSM6AAAAABGSVNAAGWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTT6D6REA.gif Message ID: @. @.> >

flatsiedatsie commented 3 months ago

This happens on Mac OS too. As soon as VAD is activated, the music that was playing starts sounding like it's coming from a 1980's radio in the back of a Morroccan deli.

Screenshot 2024-06-05 at 10 23 21

I was wondering: does the VAD audio context have to be connected to speakers? Is there a way to disconnect it from the speakers?

elephantau commented 2 months ago

I've faced the same issue on:

The playing audio is straight forward:

async function playAudio(audioData) {
    const audioContext = new (window.AudioContext || window.webkitAudioContext)()
    try {
        const audioBuffer = await audioContext.decodeAudioData(audioData)
        const source = audioContext.createBufferSource()
        source.buffer = audioBuffer
        console.log(source)
        source.connect(audioContext.destination)
        source.start()
    } catch (error) {
        console.error('Error decoding audio data:', error)
    }
  }

Where audio data been prepared as (https://wiki.vad.ricky0123.com/en/docs/user/api#example):

   // do something with `audio` (Float32Array of audio samples at sample rate 16000)...
    const wavBuffer = vad.utils.encodeWAV(audio,1,16000,1,16)
    playAudio(wavBuffer)

so, the original WAV audio have sampleRate 16kHz and seems fine if I look into headers.

This is the output for source from console on iOS:

AudioBufferSourceNode
buffer: AudioBuffer {length: 82944, duration: 1.728, sampleRate: 48000, numberOfChannels: 1, getChannelData: function, …}
channelCount: 2
channelCountMode: "max"
channelInterpretation: "speakers"
context: AudioContext {baseLatency: 0.0026666666666666666, getOutputTimestamp: function, suspend: function, resume: function, close: function, …}
detune: AudioParam {value: 0, automationRate: "k-rate", defaultValue: 0, minValue: -3.4028234663852886e+38, maxValue: 3.4028234663852886e+38, …}
loop: false
loopEnd: 0
loopStart: 0
numberOfInputs: 0
numberOfOutputs: 1
onended: null
playbackRate: AudioParam {value: 1, automationRate: "k-rate", defaultValue: 1, minValue: -3.4028234663852886e+38, maxValue: 3.4028234663852886e+38, …}
AudioBufferSourceNode Prototype

This is the output for source from console on MacOS:

buffer: AudioBuffer {length: 88905, duration: 2.015986394557823, sampleRate: 44100, numberOfChannels: 1, getChannelData: function, …}
channelCount: 2
channelCountMode: "max"
channelInterpretation: "speakers"
context: AudioContext {baseLatency: 0.0029024943310657597, getOutputTimestamp: function, suspend: function, resume: function, close: function, …}
detune: AudioParam {value: 0, automationRate: "k-rate", defaultValue: 0, minValue: -3.4028234663852886e+38, maxValue: 3.4028234663852886e+38, …}
loop: false
loopEnd: 0
loopStart: 0
numberOfInputs: 0
numberOfOutputs: 1
onended: null
playbackRate: AudioParam {value: 1, automationRate: "k-rate", defaultValue: 1, minValue: -3.4028234663852886e+38, maxValue: 3.4028234663852886e+38, …}
AudioBufferSourceNode Prototype

It's doesn't matter whatever I set 16000 Hz or 32000 or any other It will be set as iOS/MacOS like. The sound will be distorted on iOS/MacOS. But seems fine on Firefox on MacOS and not fine on Firefox on iOS.

UPD:

Seems like the problem is in sample rate convertion. There is a bug or feature in Safari /iOS/MacOS which can not correctly convert sample rate from it's Source rate to destination Context's rate in case where destination rate and source rate values divided with remainder. Ex. SR: 16000, DR: 16000 - ok SR: 16000, DR: 32000 - ok SR: 16000, DR: 44100 - distorted SR: 16000, DR: 48000 - distorted (!) above is only true for iOS/Safari/Firefox(same api as Safari on iOS) and MacOS/Safari. There is no sample rate conversion issues for other browsers like Chrome/Firefox on MacOS or Chrome/Firefox/Android or Linux or other OS like Windows.

To resolve the issue the sample rate of audioContext should set manually like:

async function playAudio(audioData) {
    const audioContext = new (window.AudioContext || window.webkitAudioContext)({sampleRate:32000})
    try {
        const audioBuffer = await audioContext.decodeAudioData(audioData)
        const source = audioContext.createBufferSource()
        source.buffer = audioBuffer
        console.log(source)
        source.connect(audioContext.destination)
        source.start()
    } catch (error) {
        console.error('Error decoding audio data:', error)
    }
  }