Question about data format

sharathadavanne / seld-dcase2022

Baseline method for sound event localization task of DCASE 2022 challenge

52 stars 22 forks source link

I have a question about the FOA format of the dataset. What is the difference between the FOA format you presented and the first-order spherical harmonic transform coefficients? Did you apply for mode strength compensation (which I believe is similar to “Apply single channel regularized inversion” in your GitHub repo )? Also, did you apply similar linear combinations mentioned in formula 5.20 in “Jarrett, Daniel P., Emanuël AP Habets, and Patrick A. Naylor. Theory and applications of spherical microphone array processing. Vol. 9. New York: Springer, 2017.” I know you used measurement-based filters, however, I do not know if these filters take into account the mode strength compensation and formula 5.20 mentioned above.

So in short, what is the relation between the FOA format data and the first-order spherical harmonic transform coefficients?

Hi,

Mode compnsation and all that is included in the system inversion based on the measurement filters, you can find the encoding details in the publication referenced in the DCASE task webpage. There you can also see comparisons between measurement based filters and the theoretical encoding matrices you describe (SH matrixing plus single channel mode strength equalization). I didn't check the formula by Jarett you mentioned, but it should be along those lines..

In short, FOA stands for first-order Ambisonics, which are the first-order spherical harmonic transform coefficients. In spherical harmonic jargon, that would be the coefficients for Schmidt semi-normalized real spherical harmonics. If for any reason you are using complex spherical harmonics, you can convert from one to the other easily, you can find relations online for that, or find also routines in https://github.com/polarch/Spherical-Harmonic-Transform.

Hope that helps.

sharathadavanne / seld-dcase2022

Question about data format #7