introlab / odas

ODAS: Open embeddeD Audition System
MIT License
776 stars 247 forks source link

Sound Source Localization support for "T" microphone array #27

Open chengl2 opened 6 years ago

chengl2 commented 6 years ago

Hello, I took one picture to show my microphone array: the_construction

I want to use 'Sound Source Localization' to estimate elevation and azimut on my project.

We have 8 microphone. You can think of it as two linear microphone array, each array have 4 microphones.

By 1,2,3,4 I got azimut. (relative to xoy plane)

By 5,6,7,8 I got elevation. (relative to xoz plane)

With azimut and elevation, I can control the camera to rotate to the sound location.

Can you tell my how to use odas to make ssl work.

thank you.

FrancoisGrondin commented 6 years ago

Hi,

Thank you for the info. In this case, assuming omnidirectional microphones, I would do something like that. The origin could be between microphones 6 and 7 (what's labeled centre on your photo). Then all microphones xyz-coordinates would be reference to that point.

mics = (

        # Microphone 1
        { 
            mu = ( <mic1-x>, <mic1-y>, <mic1-z> ); 
            sigma2 = ( +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000 );
            direction = ( +0.000, +0.000, +1.000 );
            angle = ( 180.0, 180.0 );
        },

        # Microphone 2
        { 
            mu = ( <mic2-x>, <mic2-y>, <mic2-z> ); 
            sigma2 = ( +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000 );
            direction = ( +0.000, +0.000, +1.000 );
            angle = ( 180.0, 180.0 );
        },        

        # Microphone 3
        { 
            mu = ( <mic3-x>, <mic3-y>, <mic3-z> ); 
            sigma2 = ( +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000 );
            direction = ( +0.000, +0.000, +1.000 );
            angle = ( 180.0, 180.0 );
        },

        # Microphone 4
        { 
            mu = ( <mic4-x>, <mic4-y>, <mic4-z> ); 
            sigma2 = ( +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000 );
            direction = ( +0.000, +0.000, +1.000 );
            angle = ( 180.0, 180.0 );
        },               

        # Microphone 5
        { 
            mu = ( <mic5-x>, <mic5-y>, <mic5-z> ); 
            sigma2 = ( +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000 );
            direction = ( +0.000, +0.000, +1.000 );
            angle = ( 180.0, 180.0 );
        },

        # Microphone 6
        { 
            mu = ( <mic6-x>, <mic6-y>, <mic6-z> ); 
            sigma2 = ( +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000 );
            direction = ( +0.000, +0.000, +1.000 );
            angle = ( 180.0, 180.0 );
        },        

        # Microphone 7
        { 
            mu = ( <mic7-x>, <mic7-y>, <mic7-z> ); 
            sigma2 = ( +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000 );
            direction = ( +0.000, +0.000, +1.000 );
            angle = ( 180.0, 180.0 );
        },

        # Microphone 8
        { 
            mu = ( <mic8-x>, <mic8-y>, <mic8-z> ); 
            sigma2 = ( +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000 );
            direction = ( +0.000, +0.000, +1.000 );
            angle = ( 180.0, 180.0 );
        }            
);

Moreover, to detect only source in front of your cameras, I would aim the spatial filter in the direction of your cameras field of view (if I understand your sketch, the microphones 1-4 form a line on the z-axis, so the cameras would point in that direction). This should do it:

spatialfilter: {

    direction = ( +0.000, +0.000, +1.000 );
    angle = (80.0, 100.0);

};    

Let me know if you have any questions,

Cheers

chengl2 commented 6 years ago

thank you for your help. I've got one more question. There are 8 microphones on my board, each microphone is one channel. In fact, the number I drew on the above picture is not the real channel number. How the coordinate corresponding to the channel number?

chengl2 commented 6 years ago

I recorded the 8 channels' audio by audicity, I drew the microphone number on the picture. selection_002

FrancoisGrondin commented 6 years ago

Hi, you can use the mapping parameter to achieve this:

mapping:
{

    map: (2, 1, 4, 3, 7, 8, 5, 6);

}

Which maps mic 1 to channel 2, mic 2 to channel 1, mic 3 to channel 4, mic 4 to channel 3, mic 5 to channel 7, mic 6 to channel 8, mic 7 to channel 5 and mic 8 to channel 6.

chengl2 commented 6 years ago

Thank you. I got the ssl data like below:

{ "timeStamp": 9, "src": [ { "x": 1.000, "y": -0.000, "z": 0.000, "E": 0.284 }, { "x": -0.981, "y": 0.151, "z": 0.118, "E": 0.133 }, { "x": 0.988, "y": -0.156, "z": 0.000, "E": 0.100 }, { "x": 0.979, "y": -0.034, "z": 0.199, "E": 0.073 }
] } { "timeStamp": 10, "src": [ { "x": 1.000, "y": -0.000, "z": 0.000, "E": 0.309 }, { "x": 0.972, "y": -0.104, "z": 0.209, "E": 0.138 }, { "x": -0.981, "y": 0.151, "z": 0.118, "E": 0.091 }, { "x": 0.981, "y": 0.038, "z": 0.188, "E": 0.049 } ] }

I don't know what the xyzE mean.

FrancoisGrondin commented 6 years ago

These are the xyz-coordinates of the direction of arrival of sound, and E is the energy level (between 0 and 1). A value of 0 means no energy, and a value of 1 means high energy. A potential source with high energy will most likely trigger the tracking of this source by the tracking module. Right now you output in the terminal the results of the localization module, which can be quite noisy. If you want to look at the tracked sources, you should print in the terminal the results of the tracked module.

chengl2 commented 6 years ago

Thanks. You mean I should look at file tracks.txt? I don't know the meaning of the data either.

{ "timeStamp": 145847, "src": [ { "id": 1, "tag": "dynamic", "x": 1.000, "y": -0.008, "z": 0.007, "activity": 0.998 }, { "id": 411, "tag": "dynamic", "x": -0.996, "y": 0.084, "z": 0.007, "activity": 0.006 }, { "id": 0, "tag": "", "x": 0.000, "y": 0.000, "z": 0.000, "activity": 0.000 }, { "id": 0, "tag": "", "x": 0.000, "y": 0.000, "z": 0.000, "activity": 0.000 } ] } { "timeStamp": 145848, "src": [ { "id": 1, "tag": "dynamic", "x": 1.000, "y": -0.009, "z": 0.006, "activity": 0.998 }, { "id": 411, "tag": "dynamic", "x": -0.996, "y": 0.084, "z": 0.007, "activity": 0.000 }, { "id": 0, "tag": "", "x": 0.000, "y": 0.000, "z": 0.000, "activity": 0.000 }, { "id": 0, "tag": "", "x": 0.000, "y": 0.000, "z": 0.000, "activity": 0.000 } ] }

Is there document to tell me the meaning of: id tag activity?

FrancoisGrondin commented 6 years ago

Documentation is currently being written. I know it would help to have this info that's why I'm speeding things to get something out asap.

id is a unique id that is assigned to each newly tracked source tag identifies the type of tracked source: in this case "dynamic" means that the source "appeared" and was generated from the localization module, and was not set in advance by the user. activity indicates for the actual frame what is the probability the source is active (between 0 and 1). From the log you showed me, it seems the source located at approx. x = 1, y = 0 and z = 0 is active, while the one located at approx. x = -1, y = 0 and z = 0 is inactive

chengl2 commented 6 years ago

I don't quite understand below items in bold:

Microphone

{ mu = ( , , ); sigma2 = ( +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000 ); direction = ( +0.000, +0.000, +1.000 ); angle = ( 180.0, 180.0 ); },

and

spatialfilter: { direction = ( +0.000, +0.000, +1.000 ); angle = (80.0, 100.0); };

Could you please explain these items with a picture for me?

chengl2 commented 6 years ago

It's a conference system, I want to detect the direction of human voice, then control the ptz of two cameras to the direction of human. ptz: up and down, left and right. the picture below: _003

Is odas good at to do this?

GodCed commented 6 years ago

Hi, I suggest you take a look at this discussion about the spatial filter. What it does is adjust the gain according to the angle of arrival. When specified in the mic configuration, the gain applies to the microphone. When specifies in the spatial filter it applies to the system output.

It is used to account for the directivity of the microphones and to limit the sound "search area" to a specified zone. With your setup, I would suggest you use

direction = ( +1.000, +0.000, +0.000 );
angle = (80.0, 100.0);

for both, as your microphones and your array are listening in the X axis direction.

GodCed commented 6 years ago

Considering your conference system, I personally used ODAS to add an overlay to a video stream to show audio sources and also to track an audio source with a PTZ camera.

With your setup, you should be able to use the tracking module to aim your camera by converting x, y and z, which represent a direction vector, to an azimut and elevation. If you have an idea of the distance between to speaker and the camera, you may want to account for the offset between your matrix origin and your cameras origin, to improve your aiming precision.

chengl2 commented 6 years ago

Thank you very much.

If I want to check if ODAS give me the right direction in real time, what should I do?

GodCed commented 6 years ago

I suggest you have a look at ODAS Studio. It’s a desktop app built to display ODAS data in real time. You can see acoustic energy and tracked sources in real time both in azimut-elevation and unit-sphere x-y-z format.

chengl2 commented 6 years ago

I tried ODAS Studio, it's powerful.

Even if no person talking, it was able to draw a lot of points near the x axis:

_004

I tried to adjust the "energy range", it works. But the voice of the people seem to be filtered out a lot either.

I stood at 2 meters away, say very loudly to be detected by ODAS. how to fine tuning this?

Is there a way to filter out noise and sensitive to human voice?

FrancoisGrondin commented 6 years ago

This is strange. Seems like there is always a noise source in front of your setup, which I doubt is true in reality. Can you provide us with you config file, and maybe raw recordings from the mic array?

GodCed commented 6 years ago

I would also suggest you stop ODAS, do a recording in audacity and retry. Sometimes a weird glitch happens when opening the soundcard trough ODAS and the card output is corrupted. Opening it in another app seems to solve the issue.

chengl2 commented 6 years ago

These is my cfg file and audio data recorded by audacity.

chengl.cfg.zip

audacity_project.zip

chengl2 commented 6 years ago

I did a recording in audacity and retry and I tried below configuration too, the noise source still there.

_007

chengl.cfg.zip

FrancoisGrondin commented 6 years ago

Thank you for this feedback. I have been quite busy lately, but I'll try to run your data in the coming days! I'll keep you updated!

chengl2 commented 6 years ago

Hello, I found the DC offset of my 8 channels' audio data are all below 0, and the offset value are diff from each other.

dcoffset

How could I adjust the DC offset to 0 in odas? Could this DC offset be a cause of my ssl porblem?

huotuichang1 commented 6 years ago

what is the tool to calculate these data?

hritiksth764 commented 2 years ago

Hello, I found the DC offset of my 8 channels' audio data are all below 0, and the offset value are diff from each other.

dcoffset

How could I adjust the DC offset to 0 in odas? Could this DC offset be a cause of my ssl porblem?

how are you getting these values ?