respeaker / seeed-voicecard

2 Mic Hat, 4 Mic Array, 6-Mic Circular Array Kit, and 4-Mic Linear Array Kit for Raspberry Pi
GNU General Public License v3.0
480 stars 289 forks source link

Respeaker micarray v2.0 not working properly #76

Closed phantom-j closed 6 years ago

phantom-j commented 6 years ago

Hi I am using respeaker mic array 2.0 with raspberry pi(raspbian strech) and AVS. i am using single channel firmware. while using alexa it is giving same performance as respeaker 2-mic pi-hat.I tried by changing parameters of in-built algorithms,but i don't know which parameters have to change to increase performance .

What i have to do for increase performance of mic array 2.0?

KillingJacky commented 6 years ago

Hi @jay7583 Would you mind describing your issue with more detail? We can't get what your issue is. You might want to point out what software library you're using, what demo you're running, and what your hardware setup is (especially please refer the exact product SKU or name in your issue statement).

phantom-j commented 6 years ago

Hi @KillingJacky I am using Raspbian strech os in Raspberry pi 3 model B.I have installed AVS-SDK cpp version in that. While i'm talking to alexa,Sensory Wakeup word engine is not working properly.Out of 10 times me shouting "Alexa" ,it's only responding 2 times. I have tried to tune AGCGAIN and AGCMAXGAIN parameters,but resulted in almost same performance.

SKU: ReSpeaker Mic Array v2.0 XMOS-XVF 3000

phantom-j commented 6 years ago

Hi @KillingJacky I did Out of box demo as mentioned in http://wiki.seeedstudio.com/ReSpeaker_Mic_Array_v2.0/, to compare the waveforms from audacity I played the very same song in background - "carpenter sha la la" , and I got the following result as shown in image. I see the Channel5 audio is completely silent and because of that I believe noise cancellation is not happening, in turn causing the AGC to amplify the noise as well.

I tried to turn ON - STATNOISEONOFF, STATNOISEONOFF_SR, NONSTATNOISEONOFF and NONSTATNOISEONOFF_SR manually but ended with same results.

Also to make sure my previous tuning settings are not effecting my current results I re-uploaded the 6 channel firmware as shown in wiki tutorial.

screen shot 2018-05-26 at 3 06 25 pm
xiongyihui commented 6 years ago

@jay7583 It seems you didn't use the usb 4 mic array as the output device, so the channel 5, which is playback data, is silent. If you want to use the AEC of the usb 4 mic array, you need to use the usb 4 mic array as the output device.

phantom-j commented 6 years ago

Hi @xiongyihui I put usb 4 mic array as the output device in audacity.I got this result.no change in channel5. Where do i have to change default settings to select that channel? screenshot from 2018-05-29 09-57-06

phantom-j commented 6 years ago

Hi @xiongyihui I have changed default device and subdevice settings in /usr/share/alsa/alsa.conf after checking in audacity ,but it's giving the same result. screenshot from 2018-05-29 10-16-01

xiongyihui commented 6 years ago

@jay7583 How do you play the song? In your screenshot, you selected usb 4 mic array as the output device, but you were recording, not playing any sound. Of course, playback was silent.

You can use aplay -v -D plughw:1 audio.wav to play an audio and use audacity to record at the same time. (Suppose the sound card number is 1)

phantom-j commented 6 years ago

Hi @xiongyihui Now we got that what you said.Actually we believed channel-5 will show environmental noise,we got to know it is only taking playback noise. Is there any way we could reduce environment noise to improve the efficiency of Sensory Wake Up Word Engine, as we are using this mic array for AVS application and it is not recognizing the wakeup word properly all the time (2 out of 10 times working)

Also did you get a chance to test this mic with AVS-SDK? If yes, what configurations did you use.

Also for playback through AVS, could you tell us how do we change the default output device inside AVS.

Thanks

xiongyihui commented 6 years ago

I haven't tried Sensory's Wake Up word Engine, but I tried to use Snowboy as keyword detector. I used https://github.com/voice-engine/voice-engine/blob/master/examples/raw_vs_processed.py to compare raw audio and processed audio. The processed audio is much better at 3 meters distance.

phantom-j commented 6 years ago

Hi @xiongyihui I have tried snowboy from above link and http://docs.kitt.ai/snowboy/ both. It was working less than 1 meter only ,above 1 meter it was working 1out of 10 times

Did you get a chance to test this mic with AVS-SDK? could you provide results for that?

phantom-j commented 6 years ago

Hi @xiongyihui Is there any update on above issue?

ash-lat commented 6 years ago

Hi Everyone

Even I am facing the same issue Tried to debug with changing parameters in tuning, but no success yet.

Only hot word detection is not working properly , otherwise after passing hot word mannually the AVS works even from 15 fts distance. (Think Amazon guys have got some solid sound processing on the cloud)

xiongyihui commented 6 years ago

I'm afraid that the hot word detection is quite related with the model of the hot word (its training data). Recording 3 pieces of a hot word using the mic array, and training a custom hot word model of the snowboy, it may improve the detection rate.

ash-lat commented 6 years ago

@xiongyihui I agree,I tried to train the model with snowboy and get custom model earlier, but theres lot of false detection in it.

and I also have the 2mic array as well, and the hot word detection works better while using it when compared to new 4mic-V2.0 but the problem there is range as it works good only till about 1.5-2 mts.

(PS- I increase the "capture" volume from alsamixer for 2-mic array to "82" for it to work otherwise detection rate is low at default value, but it has lot of noise in it)

Could you help me find the parameters which would supress the noise and increase the gain in 4mic-V2.0.

I have attached the wav files from both 2mic array and 4mic-V2.0 array for your comparision. Recording Samples.zip Thanks

xiongyihui commented 6 years ago

There is a python script tuning.py to adjust the AGC of the mic array.

By default, Automatic Gain Control (AGC) is on. We can get the instantaneous gain using tuning.py:

$ python tuning.py AGCGAIN 
AGCGAIN: 1.17733770423

To increase the gain, we can turn AGC off and set a fixed gain, for example:

$ python tuning.py AGCONOFF 0
AGCONOFF: 0
$ python tuning.py AGCGAIN 10
AGCGAIN: 10.0

To get the full list parameters to control, run:

python tuning.py -p

We may need a new firmware which has the higher gain of the raw audio. The gain is too low in the current firmware.

xiongyihui commented 6 years ago

@jay7583 I don't have much experience of using the avs-device-sdk. We have a python avs sdk https://github.com/respeaker/avs. I use it to test.

ash-lat commented 6 years ago

@xiongyihui Thanks for the quick response, much appreciated

I tried

$ python tuning.py AGCONOFF 0
AGCONOFF: 0
$ python tuning.py AGCGAIN 10
AGCGAIN: 10.0

and it worked better at AGCGAIN around 25-27, I am getting hit rate of around 6/10, but with low ambient noise, If i start playing my television or play song I am back to 2/10 it looks that the noise is getting amplified (checked spectogram in Audacity)

Also I tried python avs sdk https://github.com/respeaker/avs, pocketsphinix has better hit rate 6/10, but snowboy is still same.

Is there anyway to increase gain and keep the noise supressed?

Thanks

xiongyihui commented 6 years ago

I’m afraid that the usb mic array can't distinguish the voice from TV and the speaking voice. To cancel playing song, it requires using the usb mic array as the output device which makes the AEC algorithm work.

ash-lat commented 6 years ago

@xiongyihui

I see And is there any other possible ways to increase its performance, can you bring the firmware update to correct the gain with AGC ON. Maybe that would work better for hotword detection

Thanks

xiongyihui commented 6 years ago

There are new firmware with a higher gain at https://github.com/respeaker/usb_4_mic_array. You can have a try.

If the AGC is on, you can increase AGCDESIREDLEVEL to increase the gain.

python tuning.py AGCDESIREDLEVEL 0.01
ash-lat commented 6 years ago

@xiongyihui Thanks for the new firmware , the 6_channels_firmware_12.06dB.bin works better than default and 6 dB one.

And I tried to change the AGCDESIREDLEVEL it works better at 1e-08 level sudo python tuning.py AGCDESIREDLEVEL 0.00000001

7/10 hit rate I get at low ambient noise, but with higher ambient noise I have to shout louder than the ambient noise threshold to work, which is OK i guess (but could get better) (as of now it works only to a distance of 2-2.5 mts)

I tried over 40 values ranging from 0.1 and finally reaching to 1e-08, and the performance keeps on increasing when I am reaching closer to 1e-08 level

Currently, if I set a value lower than 1e-08 it goes to default value of 0.004998185148

Could you help decrease AGCDESIREDLEVEL further, as I believe it would work better in lower value around 1e-12 maybe.

Also how do I save this level, as it sets to default on restart

Thanks

ash-lat commented 6 years ago

Also could you share the firmware with higher gain than 12.06 dB

Thanks

xiongyihui commented 6 years ago

Sorry, saving settings is not supported. If you set AGCDESIREDLEVEL to 1e-8, why not turn off the AGC and set a fixed gain. The volume is too low.

Further increasing the gain don't increase the SNR. Why do you want to increase the gain of raw audio but decrease the gain afterward? What's your application?

ash-lat commented 6 years ago

@xiongyihui

So far with hit and trial, using 12.06 dB firmware and setting AGCDESIREDLEVEL to 1e-8 is giving best possible results in detecting wake up word (And even I am not able to understand how increasing the gain of raw audio and decreasing gain afterwords is working out for Snowboy and Sensory both )

Also I observed with little ambience noise the performance is better when compared to a silent room (May be because of right amount of AGC)

Could you please help me eleborating what each parameter does in python tuning.py -p As currently I am either just following the instructions here or playing hit and trial in tunning the settings. (Afterwhich I shall be able to provide you the perfect settings which you may share with all users with similar applications)

My application is to use AVS in a smart speaker, where sensory needs to detect the hot word and AVS needs to receive right audio file for processing.

No worries, I shall take care of saving settings part

xiongyihui commented 6 years ago

Did you use the first channel of audio data to detect the keyword? As there are 6 channels, it would be confusing if the 6 channels are mixed into 1 channel.

ash-lat commented 6 years ago

@xiongyihui Thanks for pointing that out.

I checked the channel for Snowboy and found, it is taking Channel-1 (Raw audio for detection) more efficiently when compared with Channel-0, and it starts taking only channel-1 when the distance is increased (above 1.5 mts) This test I did with AGCDESIREDLEVEL = 0.004998185148 (Default) and had almost same results at AGCDESIREDLEVEL = 0.01 & 0.1 (little worse actually)

detected @ 1
detected @ 1
detected @ 0
detected @ 1
detected @ 0
detected @ 1
detected @ 1
detected @ 0
detected @ 1
detected @ 1
detected @ 1
detected @ 1
detected @ 1
detected @ 1
detected @ 1
detected @ 1

Used - https://github.com/voice-engine/voice-engine/blob/master/examples/raw_vs_processed.py

And stops taking Channel 0 completely when I enter AGCDESIREDLEVEL = 1e-08 (which is self explanatory)

But still for above test also I get a hit rate of 6/10 only I believe increasing the gain for raw data increased the performance by little, as with the default (old)firmware I am back to hit rate of 2/10

Is there any way we can increase the gain (both processed and raw) while keeping the SNR low?

xiongyihui commented 6 years ago

It seems the Snowboy has a energy threshold, keywords under the threshold will be ignored. Increase the gain (without distortion) will improve the detection rate. But increasing the gain also increases the risk of distortion. That's reason of using AGC. But the builtin AGC is not perfect one.

ash-lat commented 6 years ago

Yes looks like that, as low ambient noise might be increasing AGC and hence better detection rate. Any plans on improving AGC?

Also any work around that you could suggest for now?

xiongyihui commented 6 years ago

Maybe disable the AGC and control the gain on the computer side