introlab / odas

ODAS: Open embeddeD Audition System
MIT License
780 stars 248 forks source link

Beamforming Vector application #172

Closed lukeseed closed 5 years ago

lukeseed commented 5 years ago

Hi.

I think this issue is pretty simple but I don't have enough definition of how odas relates vectors to areas (if it does at all).

I'm trying to use ODAS, with a respeaker 4 mic array, to separate ambient sounds coming from (4) four quadrants above the mic plane. I believe this can be done by using the fixed spatial filter (Issue #19) to remove all sounds from below the mic plane, and then using (4) four static targets under the sst module (Issues #158, 13). From these 4 targets I believe I should end up with 4 audio channels from the 4 different regions.

What I can't figure out is how a "static target" vector would equate to a quarter of a hemisphere. Can anyone please help here?

Thanks, Luke

lukeseed commented 5 years ago

@taospartan any suggestions here based on your experiences with odas?

taospartan commented 5 years ago

It’s been a while... but the static targets that are given using vectors , are just that, they don’t relate to the quadrants of the hemisphere.

Hope that helps

Sent from my iPad

On 3 Sep 2019, at 21:52, lukeseed notifications@github.com wrote:

@taospartan any suggestions here based on your experiences with odas?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

lukeseed commented 5 years ago

@taospartan thanks.

Any thoughts on this @FrancoisGrondin ? I'm having a hard time understanding how to beamform with odas. Maybe it doesn't work like how I'm thinking it could.

FrancoisGrondin commented 5 years ago

Hi Luke,

So far the static source points to a specific direction, and not an entire quadrant. But you could just let ODAS detects automatically a source in this region, and then do beamforming, instead of using a static source. Is this something possible for your application?

Cheers,

lukeseed commented 5 years ago

Hi @FrancoisGrondin , Thanks for the response! I'll try to explain what I'm trying to do specifically and then perhaps you could explain how to go about it or if ODAS is the best solution.

We are working on an audio detection system that would 1) reject ambient noise coming from behind a planar microphone array (which we know is all extraneous to us), 2) listen for and detect a specific set of transient audio signals which are non-human voice, and 3) hopefully tell what quadrant of the hemisphere we are listing to these signals are coming from.

My thought was that if we set the spatial filter to reject everything behind the microphone, we could then set four static vectors (representing each quadrant) that would just continually output an audio stream, which would would finally apply our signal detection algorithm to.

I suppose another solution would be to develop a different library to webrtcvad that would do our signal detection and then try to use dynamic targets in ODAS?

Thoughts here? Am I way off the mark? Thanks again for your replies.

Luke

FrancoisGrondin commented 5 years ago

Hi Luke,

So if I understand, all the microphones lie on the same 2D plane? If so, you have a front-back ambiguity, which means without microphones in the 3rd dimension, the system cannot tell if a source comes from the front or the back, and thus cannot distinguish your target from noise.

What do you mean by a quadrant of the hemisphere?

So you want to trigger on non-stationary sounds, but exclude speech, correct? Is there a specific family of sounds you would like to target? Have you considered any ML-oriented approach?

WebRTCvad as far as I know will trigger on any non-stationnary signals, including speech. The library is great to achieve low-latency, but will not provide you with the DOA, and will not make the difference between speech and non-speech (from my previous tests, it allows many false alarms for non-speech signals).

lukeseed commented 5 years ago

Francois,

So yes, I am currently using the reSpeaker 4 mic array and am aware that a 3d array allows for rejection of signals that would otherwise be pole ambiguous (though was hoping to avoid this more complicated hardware development sequence).

Yes, machine learning would be apart of our development of this solution. My question is, where would I cut this in with Odas? i.e, where would I implement the signal detection algorithm in the grand scheme of Odas dynamic signal tracking and DOA calculation abilities?

Thanks again. Luke

FrancoisGrondin commented 5 years ago

In this case ODAS already does DOA calculation and signal tracking as you know. My feeling would be to perform sound classification on the separated signal after beamforming. Keep in mind however that beamforming enhances the signal, but does not lead to perfect sound separation. So in the case you have 2 different signals that interfere with each other, the classification system could misclassify a source if there is some leaking for the competing source. Sound localization & classification is a challenging task, and I invite you to read my recent paper that will be presented at the DCASE workshop in October: "Sound Event Localization and Detection Using CRNN on Pairs of Microphones"

I believe it's not public yet so email me (fgrondin@mit.edu) if you want a copy of the paper.

Cheers!

lukeseed commented 5 years ago

I will email you Francois thank you for the help.

If you wouldn't mind printing me in the direction of how to establish generic signal recognition I would appreciate it. All examples I have seen are based on human voice detection (mostly around vad webrtcvad)

Thanks

Luke Seed || mobile

On Sat, Sep 14, 2019, 12:45 PM François Grondin notifications@github.com wrote:

In this case ODAS already does DOA calculation and signal tracking as you know. My feeling would be to perform sound classification on the separated signal after beamforming. Keep in mind however that beamforming enhances the signal, but does not lead to perfect sound separation. So in the case you have 2 different signals that interfere with each other, the classification system could misclassify a source if there is some leaking for the competing source. Sound localization & classification is a challenging task, and I invite you to read my recent paper that will be presented at the DCASE workshop in October: "Sound Event Localization and Detection Using CRNN on Pairs of Microphones"

I believe it's not public yet so email me (fgrondin@mit.edu) if you want a copy of the paper.

Cheers!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/introlab/odas/issues/172?email_source=notifications&email_token=ADBJVFWXQIYXHM3HULOUMZLQJU5O3A5CNFSM4ITJXRH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6XCWQA#issuecomment-531508032, or mute the thread https://github.com/notifications/unsubscribe-auth/ADBJVFVKNARBL23JQ4QBPVDQJU5O3ANCNFSM4ITJXRHQ .