sharathadavanne / seld-dcase2020

Baseline method for sound event localization task of DCASE 2020 challenge
http://dcase.community/challenge2020/task-sound-event-localization-and-detection
Other
53 stars 18 forks source link

Question on label encoding #4

Closed fColangelo closed 4 years ago

fColangelo commented 4 years ago

Hi and thank you for sharing the code for the baseline system. I am trying to understand how the labels are encoded for this problem, given that two instances of the same event can be active. From the code, i got that the network outputs a 14 dimensional "sed_out" vector, which should be the presence/absence label and a 42 dimensional "doa_out" vector. I can understand how we need two floats for elevation and azimuth, so 28 elements are needed. Can you please provide some clarification on how the track information is encoded and why 42 elements are needed in the output? Thank you!

polarch commented 4 years ago

Hi,

each direction-of-arrival an be represented either in spherical coordinates (azimuth, elevation) or in Cartesian (x,y,z), and you can go from one to the other. I think the current implementation outputs the x,y,z coordinates, as the loss is computed with Cartesian vectors, and leaves the final conversion to azimuth-elevation to the user, if desired.

By the way the conversion is as follows: doa = [x,y,z]; doa_norm = [x,y,z]/sqrt(x^2+y^2+z^2); % normalizes the vector output to unit length

doa_norm = [x_norm, y_norm, z_norm]; doa_azi = atan2(y_norm, x_norm); doa_elev = atan2(z_norm, sqrt(x_norm^2+y_norm^2) );

fColangelo commented 4 years ago

Hi and thank you for the swift response!

Can you please clarify how the multiple track information is taken into account? If, say, two different baby cry events are present at the same time in different directions?

Thank you!

polarch commented 4 years ago

It does not, the baseline SELDnet implementation cannot distinguish the individual instances in that case. But the data of course can have this realistic case occurring occasionally.

fColangelo commented 4 years ago

Understood, thanks a lot for the information!