HOA decoder based on HRIR convolution

PyrApple commented 8 years ago

Dear Archontis,

The present pull request concerns (mainly) the WebAudio HOA binaural decoder.

If I'm not mistaken (please, do tell me if I am), the actual HOA decoder is more of a "Ambisonic room reverb" convolver connected to a stereo decoder (see WebAudio_HOA.js l.189 ). The updated version is based on the virtual speaker approach applied via a convolution with what one could call an "Ambisonic Binaural Impulse Response". I also added Matlab scripts to generate those.

Feel free to test the updated version (the difference is actually clearer with an anechoic file, I kept the original drum beat for fairer comparison).

polarch commented 8 years ago

Hi David,

Thanks for your interest in the library (and your work)! Unfortunately I am between moving countries at the moment for work till the middle of next week and I'll have very little time in the meantime to review the code, after that I can check it immediately.

About the binaural decoder, no, it is an exact ambisonic to binaural decoder. The virtual loudspeaker approach is completely unnecessary from an ambisonic perspective, the natural operation is to have both your soundfield encoded into SHs (the HOA signals) and the HRTFs, then your output is given directly by the dot product of the two sets of coefficients. The virtual loudspeaker approach has the extra step of getting back to the space domain (decoding) and then convolving and summing the results. Both can be parameterized into (N+1)^2 filters if you assume antisymmetric left-right HRTFs, otherwise 2*(N+1)^2 filters for general different HRTFs between left-right ear.

What seems like a stereo decoder, is a trick that exploits this symmetry in the end of the convolutions, I can explain that in more detail soon.

However the virtual decoder approach is useful if you need to "listen" to the effects of a loudspeaker decoder, I have some Matlab functions for this. The Fuma convention will become obsolete completely (even by Richard Furse himself), as it is order-limited and not very nice from a math perspective. But it doesn't harm if it's in the code for backwards compatibility with such encoded material! I have some objects for SN3D conventions that i will also push soon, which may become the defacto normalization.

Sorry for the hurried style, I'll get back to you on this soon!

Many thanks, Archontis

PyrApple commented 8 years ago

I rushed to conclusions, my apologies.

1-0, time to strike back :).

My confusion came from the final quality of the rendering, somehow seeming far less accurate in term of sound source localization than what I expected. Someone told me once that this could come from using the SH encoded HRTF approach, where you need damn high HOA orders to keep high frequency cues accurate (while quoting "someone" is no reference and a short scholar search seems to indicate otherwise, e.g. here).

I’ll let you listen to the proposed approach, if you perceive any differences we’ll see if it’s worth modifying the code then. If so, the

"The virtual loudspeaker approach has the extra step of getting back to the space domain (decoding) and then convolving and summing the results."

won’t bother us that much since I crushed the virtual speaker decoding and HRIR convolving in a single set of IRs. It still requires twice as much as there are Ambisonic channels, but it’s nearly transparent code-wise (and works great for ambisonic room reverb because of HF conservation).

Regarding

"The Fuma convention will become obsolete completely (even by Richard Furse himself)"

I can’t wait, and will be glad to dump this ugly piece of converter then.

Regards, David

polarch commented 8 years ago

Hi David,

I uploaded some scripts that show how the filters are obtained, the functions can be used to get beamforming filters for any target function actually in the SHD (including HRTFs).

Don't get me wrong, I do see some benefits of the virtual decoding approach too, if not on localization, mainly on coloration and in the case that if you have sparse incomplete HRTF measurements, you can still throw some virtual decoding of them and get something reasonable.

The direct approach requires complete surrounding samples (not necessarily uniform). The high-frequency loss can be compensated on the mean by diffuse field correction, such as proposed in https://depositonce.tu-berlin.de/bitstream/11303/171/1/13.pdf

How about we have both approaches inside, with a flag

HOA_binDecoder(audioCtx, order, DIRECT_OR_VIRTUAL)

and modify the internal connections accordingly?

You can include your Matlab code too, maybe following the naming of the other functions to differentiate? getHOA2binauralFilters_virtual(...) Nice that you pull the LISTEN HRTFs directly!

Best, Archontis

PyrApple commented 8 years ago

Hello Archontis,

I'm all for the DIRECT_OR_VIRTUAL flag, for both adaptability and didactic's sake. As the virtual speaker approach implemented differs from the "direct" only through IR's matlab processing, the integration will be easy. The only fix I need to address is the HOA loading routine that you designed for N-channels & 1 ear (relying on your "trick that exploits this symmetry" thing) whereas I need N-channels & 2 ears loader .

I found some more discussions on the impact of the direct SH encoded HRTF technique here and there. I'm not finished reading but they all agree on the need for high (high) orders to preserve high frequency cues. From my understanding, without smart corrections, we'd need orders above 15 to preserve HF cues (above 10kHz) e.g. for elevation perception.

I'll pull your modifications, unify my Matlab naming conventions, integrate the flag based virtual speaker approach and submit a new pull-request. Once it's all neat we can talk about the whole restructuration of the project I have here :), packaged as a Node JS library (bundled for npm install, ES6 compliant, eventually getting rid of B_format part of the code).

polarch commented 8 years ago

Well thinking a bit more about it:

the HOA_binDecoder() does not need any change even in the virtual decoding case, while the filters can be reduced to (N+1)^2 (which is nice for performance..)

This is possible , IF your loudspeaker setup is left-right symmetric and IF you assume left-right symmetric HRTFs.

The left-right symmetry of HRTFs makes sense in practice for general applications, unless you are studying assymetry. Forcing symmetry averages out errors that most likely come from measurements rather than anatomy (and asymmetry doesn't make sense for non-individualized HRTFs anyway).

Then if your decoding directions are also left-right symmetric, you can compute the (N+1)^2 filters only for the left ear, and use HOA_binDecoder() without any changes. You can check that due to these symmetries the final filters for, let's say, Y(-1,1) channel, or Y(-2,2), for the left and right ear are just inverted.

That's the reason for the combining stage at the end, you sum up all contributions and you get the left ear signal, you sum all (m>=0) and you invert and sum (m<0) and you get the right signal. SH magic :-).

So the effort is moved to computation of the virtual decoding filters, while the audio processing remains efficient.

(I'm planning to upload somewhere some short report/paper thingie describing and comparing these two approaches next weeks)

You think it's possible to modify your Matlab scripts that way?

Thanks, Archontis

PyrApple commented 8 years ago

Agreed!

polarch commented 8 years ago

Yes, B-format is unnecessary, all processing can be done internally with the HOA library for 1st order. Some proper NodeJS packaging sounds good too.

I'll be unavailable for a couple of days unfortunately, let's catch up after that.

PyrApple commented 8 years ago

Will close pull request asa the new pull-request is resolved

PyrApple commented 8 years ago

aborted, resulted in pull-requests Virtual Speaker and NodeJS.

polarch / JSAmbisonics

HOA decoder based on HRIR convolution #1