Closed fColangelo closed 4 years ago
Hi,
each direction-of-arrival an be represented either in spherical coordinates (azimuth, elevation) or in Cartesian (x,y,z), and you can go from one to the other. I think the current implementation outputs the x,y,z coordinates, as the loss is computed with Cartesian vectors, and leaves the final conversion to azimuth-elevation to the user, if desired.
By the way the conversion is as follows: doa = [x,y,z]; doa_norm = [x,y,z]/sqrt(x^2+y^2+z^2); % normalizes the vector output to unit length
doa_norm = [x_norm, y_norm, z_norm]; doa_azi = atan2(y_norm, x_norm); doa_elev = atan2(z_norm, sqrt(x_norm^2+y_norm^2) );
Hi and thank you for the swift response!
Can you please clarify how the multiple track information is taken into account? If, say, two different baby cry events are present at the same time in different directions?
Thank you!
It does not, the baseline SELDnet implementation cannot distinguish the individual instances in that case. But the data of course can have this realistic case occurring occasionally.
Understood, thanks a lot for the information!
Hi and thank you for sharing the code for the baseline system. I am trying to understand how the labels are encoded for this problem, given that two instances of the same event can be active. From the code, i got that the network outputs a 14 dimensional "sed_out" vector, which should be the presence/absence label and a 42 dimensional "doa_out" vector. I can understand how we need two floats for elevation and azimuth, so 28 elements are needed. Can you please provide some clarification on how the track information is encoded and why 42 elements are needed in the output? Thank you!