introlab / odas

ODAS: Open embeddeD Audition System
MIT License
776 stars 246 forks source link

SSS, sources not very distinct in separated.raw. #236

Open ShawnPinchbeck opened 3 years ago

ShawnPinchbeck commented 3 years ago

I am testing ODAS for possible use in a computer interactive art piece to isolate participants and performers giving voice commands from the ambient noise/music of the active space.

In my testing, I have the ReSpeaker 4 mic array running on a Raspberry Pi 4 with speakers 2m away from the mic on opposite sides. I have Odas_web running on my Mac. I play voice from one speaker and ambient sounds or music from the other speaker at equal volumes. The two sources are tracked well and the angle of their source consistent with the speaker locations, if I move them around.

The issue is that when I listen to the separated.raw file, I hear very little separation of the two sources in the audio tracks, perhaps 2 dB of difference. It is not enough isolation to allow voice recognition to work. I have tried adjusting settings and parameters as mentioned here, but have not improved the isolation. How much separation is possible? Is my test flawed and the sounds I am playing or the way I have speakers setup not optimal for Odas? Is the ReSpeaker performing too poorly to function in the way I imagine it should and a better mic array would perform better? Should I be changing the angle of directivity? I'm not sure what the issue is. The sst and ssl are working perfectly.

Any suggestions or insights would be appreciated. Odas should do exactly what we need, if I can figure out how to improve the dB of separation.

Thanks! Shawn

Quang-Kien commented 3 years ago

Have you tried with posfiltered.raw? I found the better separation level in this file rather than seprarated.raw (?), btw, can we deliberately set the number of sources and how it is defined?

ShawnPinchbeck commented 3 years ago

Thanks for the reply! The postfiltered.raw has too many artifacts and isn't useful for voice recognition.

I'm not sure if you can change the number of sources. You can change the sensitivity of tracking, duration of tracking when objects are not making sound and you can adjust the angle of sensitivity to block out erroneous noise from directions you don't want to detect.

I'm wondering if the separation issue is related to the ReSpeaker's quality and number of microphones? I'm not sure how much of a factor this is. I don't have another mic to test this.

FrancoisGrondin commented 3 years ago

Hi there,

This mainly depends on the room acoustic. In some cases the GSS module can do a decent job, but sometimes it gets more difficult. You are right: the post-filtered version should not be used with a ASR system as it introduces some distortion.

We are currently working on PyODAS, which will be in Python, and will include some more recent DL-based methods to boost separation results. Stay tuned :)

ShawnPinchbeck commented 3 years ago

Hi Francois, I was wondering if room acoustic was a factor in the sss. I'm testing in a pretty small area. I'll have to try it out in our studio to see what the difference is.

I'm definitely tuned in! :-) What is your timeline for PyODAS release?

Cheers!

FrancoisGrondin commented 3 years ago

Hopefully by the end of fall 2021 :)

Quang-Kien commented 3 years ago

Hi Francois, I was wondering if room acoustic was a factor in the sss. I'm testing in a pretty small area. I'll have to try it out in our studio to see what the difference is.

I'm definitely tuned in! :-) What is your timeline for PyODAS release?

Cheers!

Please share your test results, Indeed the postfiltered is not recognized well though the playback showing the clear separation. On the other hand, the separated version exhibit a strong mixing impact and is no used for speech recognition as well

Quang-Kien commented 3 years ago

Hopefully by the end of fall 2021 :)

Great to hear, should be a python wrapper?

By the way, I found that the output raw always composes 4 channels, in our case we just want o separate two sources, is any way to set the number of sources at least for output?

Any please share with us a more detailed explanation of the parameters in the cfg file if you have any, your help is appreciated.

BRs

fanman2014 commented 2 years ago

I'm still unclear, what does the SSS data do? I know it's related to beamforming but does it provide the unit vector location of the sound source in 3D space?

StuartIanNaylor commented 2 years ago

PyODAS would be amazing have you got any updates Francois?

atyshka commented 1 year ago

@FrancoisGrondin is PyODAS the same project as SpeechBrain? I discovered your name under that project and noticed it overlaps with some of the work on ODAS.