Closed mreinstein closed 7 years ago
I posted my own question on the DSP stack exchange to help get a better handle on this. It includes a nice graph of the filter array :)
http://dsp.stackexchange.com/questions/37450/please-help-me-understand-this-audio-downsampling-code
nice! So you're planning to support downsampling to 8khz too? Is this to minimize bandwidth sent to watson? I'm wondering if going below 16khz actually affects recognition quality?
Well, the service supports 8kHz "narrowband" models and 16kHz "broadband" models. Right now, when you select a narrowband model, the audio is downsampled to 16kHz here, and then I think it gets downsampled again to 8kHz at the service. So, doing it once here should both save bandwidth and increase quality.
Ah, right I forgot about the narrowband model. So really this is about bandwidth savings when opting into the non-broadband model. Thanks for clarification.
@nfriedly very nice!
Question about this line:
Input audio is typically 48kHz, this downsamples it to 16kHz.
I've seen several browsers provide different sample rates. For example, Chrome right now produces 44.1khz samples. I'm wondering if this filtering logic is going to slightly distort this input?
Oh, let me adjust that. It will reduce the cutoff point of the low-pass filter slightly, but not enough to affect human speech.
There is some filtering logic happening here: https://github.com/watson-developer-cloud/speech-javascript-sdk/blob/master/speech-to-text/webaudio-l16-stream.js#L84
@nfriedly mentioned this had something to with antialiasing downsampling, but neither of us could come up with clear wordage on why that code is present or what exactly it's doing.
This is one of the few areas in the code which is very hard to understand by simpling reading it; the rest of the module is pretty self-documented and easy to follow. :)