improve audio recording quality for speech recognition

andrewpbstout commented 4 years ago

Following up on #12 , but different enough that I thought a new issue was called for:

I'm trying to improve the quality of audio recording from the speakers. I got a wakeword working by passing a gain parameter to the wakeword library, but now I'm attempting to stream audio to DialogFlow for intent recognition, and the gain parameter for the wakeword doesn't apply to that, and I think the signal-to-noise ratio is too low for DialogFlow's ASR.

When I use arecord as suggested in #12 , I get a recording with a high level of noise (I suspect fan noise inside the robot's head--the microphones don't appear to be sealed) and my voice is faint but audible. (Sitting about an arm's length away, speaking loudly.) I've tried tweaking some of the ReSpeaker parameters using the tuning script (from the link in #12 ), with minimal discernible effect. (I haven't adjusted AGCGAIN, because that seems to be a calculated parameter? It doesn't seem to have a stable value I could re-set it to. I've been trying AGCDESIREDLEVEL and MIN_NS.)

Has anyone here gotten good audio quality out of the microphones, for speech recognition of more than just a couple keywords? I could use some help.

qtrobot commented 4 years ago

Hi @andrewpbstout , It is strange that your ASR doesn't work due to noise level from microphone. We have been using "Snipes" for a while and it was working perfectly.

Regarding the audio Gain level, I am not an expert in ASR but I don't think any signal processing algorithm care about the gain. signal level between 0-1.0 is exactly the same as 0-100. The gain is only for audibility.

Regarding the SNR (signal-noise ratio), indeed this is matter a lot. Respeaker mic array is one of the descent technology in the market and its background noise reduction works pretty much well. We have tested it with different background noises and music and it could filter them very good.

Respeaker mic has 5 channels of data. One channel per each microphone to access the raw data and One channel which is mixed of them with noise reduction especially for ASR.

I am just wondering using arecord, do you specify the popper channel to be used?

In our demos with Snips ASR, we specified mic like this:

mike = "ReSpeaker 4 Mic Array (UAC1.0): USB Audio (hw:1,0)"

andrewpbstout commented 4 years ago

My microphone capture for DialogFlow was a literal copy-and-paste from what I did for snowboy (which works)...but it turns out there was one place I forgot to pass along the audio device index. [hangs head in shame]

After recording the data I was streaming out to Google and discovering that it was silent, it didn't take that long to find why and fix it. Just got it working.

(And for the record, yes, I do have to specify the channel/device, both in arecord and in pyaudio. R.I.P. Snips--that's what I was planning to use, since they provided the whole pipeline and y'all had already demoed it...they announced their shutdown just as I was starting. :-( )

(Also, I found that turning up gain did seem to help a little with the hotword detector, specifically for improving the distance from the robot at which the hotword got detected. Might be superstition, though.)

Thanks for the reply, sorry for asking what turned out to be a dumb question!

luxai-qtrobot / QA

improve audio recording quality for speech recognition #28