musikinformatik / SuperDirt

Tidal Audio Engine
GNU General Public License v2.0
523 stars 76 forks source link

Support for Ambisonics #141

Open micah-frank-studio opened 5 years ago

micah-frank-studio commented 5 years ago

It would be amazing if Tidal could use the Ambisonic Toolkit to control a sounds azimuth and altitude and encode/decode B-format for Ambisonics performance.

telephon commented 5 years ago

With some knowledge of ambisonics you could easily write a panner variant that uses it. The panning function is live hackable …

micah-frank-studio commented 4 years ago

I’ll be spending some time in Dec working with 7th order HOA and WFS systems at a research center. We are going to work with live coding (tidal and hydra) and ambisonics. My live coding work now routes audio from SC into virtual channels in Live where I’m using the Envelop ambisonic tools. It is a very cumbersome and inefficient system - and lacks the dynamics of a pure live coding paradigm where I could address azimuth and altitude directly. Nothing against Envelop as they are great tools but I must run three extra applications to use them.

Just curious, if there might be a possibility to realize a Tidal function that could directly access SC’s ambisonic add on library by dec?

With some knowledge of ambisonics you could easily write a panner variant that uses it. The panning function is live hackable …

Curious what you mean. I personally don’t have any experience with the math behind ambisonics and building a decent encoder/decoder is definitely not in my skill set.

yaxu commented 4 years ago

Hi @chronopolis5k, didn't you link to an already existing encoder and decoder above?

micah-frank-studio commented 4 years ago

Hi @chronopolis5k, didn't you link to an already existing encoder and decoder above?

Yes, sorry - my mistake. I was extrapolating beyond what @telephon suggested, which is, from what I understand, modifying the panning function? That could address azimuth I assume? But there are other params such as Altitude and Radius.

Admittedly, I haven't wrapped my head around how the pieces would fit together between tidal, sc and the ambisonics toolkit. I have some time to look into it this week and will come back when I have some more clarity there.

yaxu commented 4 years ago

In a way tidal doesn't work in the sound domain at all. It is just for patterning the parameters that are sent to superdirt. Adding a parameter to tidal is just a matter of running e.g. altitude = pF "altitude" So I think it's mostly plumbing in superdirt to use ATK for panning and route the extra parameters to it. I believe the existing panning stuff already receives extra parameters for multichannel panning such as splay

telephon commented 4 years ago

I was extrapolating beyond what @telephon suggested, which is, from what I understand, modifying the panning function? That could address azimuth I assume? But there are other params such as Altitude and Radius.

You can do it yourself, it's all in the open.

For a start, you could just write one SynthDef that uses the ambisonic toolkit and run it from tidal. The total number of channels for SuperDirt need to be correct for that.

Once this works, you can roll your own panner:

DirtPan.defaultPanningFunction = { |signal, numChannels, pan, mul|
   var altitude = \altitude.ir(0); // get extra arguments like this for event-based stuff (set param once)
   var altitude = \altitude.kr(0); // … or for global effects (change param while running)
   etc..

}
micah-frank-studio commented 4 years ago

I see. Thanks for thanks for all the info. That helps clarify things a bit as I haven’t done much research into how things are configured. Whatever I come up with, I’ll share it back here. Cheers.

micah-frank-studio commented 4 years ago

This is what I have so far. Haven't tested it yet. For some reason my params.hs is a binary file? It's actually a .hi so I cannot edit it. Any suggestions?

(
~dirt.addModule('amb-panner',
    { |dirtEvent|
        dirtEvent.sendSynth("ambpanner" ++ ~numChannels,
            [
                azim: ~azim,
                alt: ~alt,
                radius: ~radius,
                out: ~out
        ])
}, { ~azim.notNil or: { ~alt.notNil }});
);

(
{
var numChannels =  ~dirt.numChannels;

SynthDef("amb-panner" ++ numChannels, { |out, azim = 0pi, alt = 0.2pi, radius = 1 |
    var signal = In.ar(out, numChannels);
        #w, x, y, z = BFEncode1.ar(signal, azim, alt, radius);
    //decode for 2 channels, binaural
    BFDecode1.ar(w, x, y, z, [-0.25pi, 0.25pi], 0);
    ReplaceOut.ar(out, signal)
    }, [\ir, \ir, \ir, \ir]).add;
}
);
yaxu commented 4 years ago

Hi @micah-frank-studio, did you get further with this?

During lockdown I'm interested in being able to do streamed binaural performances, to have more presence for people listening on headphones.

yaxu commented 4 years ago

I'm currently trying this:

    ~dirt = SuperDirt(2, s); // two output channels, increase if you want to pan across more channels

    DirtPan.defaultPanningFunction = #{ | signals, numChannels, pan, mul |
        var channels, inNumChannels;
        #w, x, y, z = BFEncode1.ar(signals, \azim.ir(0), \alt.ir(0), \radius.tr(0));
        BFDecode1.ar(w, x, y, z, [-0.25pi, 0.25pi], 0);
    };

    ~dirt.start(57120, [0, 0, 0, 0, 0, 0, 0, 0, 0]);

with this:

d1 $ sound "bd*16"
  # pF "azim" saw
  # pF "alt" (slow 2 saw)
  # pF "radius" (slow 3 saw)

.. but can't hear any panning.

yaxu commented 4 years ago

Here's what I'm trying to do, binaural panning..

FoaTransform.ar(FoaEncode.ar(signals.sum*mul, ~encoder), 'push', pi/4, pan)

With this at startup:

~encoder = FoaEncoderMatrix.newOmni
~decoder = FoaDecoderKernel.newCIPIC

I can't get it working with superdirt though, even if I directly replace the call to DirtPanBalance2 by editing SuperDirtUGens.sc ..

telephon commented 4 years ago

Have you rebuilt the synthdefs?

perhaps try ~dirt.loadSynthDefs

yaxu commented 4 years ago

Thanks! That gets me a bit further. Here's what I'm trying:

s.options.memSize = 8192 * 64 * 2; // increase this if you get "alloc failed" messages
s.boot

~dirt = SuperDirt(2, s); // two output channels, increase if you want to pan across more channels
~encoder = FoaEncoderMatrix.newOmni
~decoder = FoaDecoderKernel.newCIPIC

DirtPan.defaultPanningFunction = #{ | signals, numChannels, pan, mul |
    FoaTransform.ar(FoaEncode.ar(signals.sum*mul, ~encoder), 'push', pi/4, pan)
};
~dirt.loadSoundFiles;

~dirt.start(57120, [0, 0, 0, 0, 0, 0, 0, 0, 0]);

~dirt.loadSynthDefs

The final line gives an error.. It seems signals is nil somehow?

FoaRotate input 0 is not audio rate:  nil nil
 ARGS:
   in: nil Nil
   angle: an OutputProxy OutputProxy
   mul: an OutputProxy OutputProxy
   add: an OutputProxy OutputProxy
   4: an UnaryOpUGen UnaryOpUGen
SynthDef dirt_sample_1_2 build failed
ERROR: FoaRotate input 0 is not audio rate:  nil nil
micah-frank-studio commented 4 years ago

Hi @micah-frank-studio, did you get further with this?

During lockdown I'm interested in being able to do streamed binaural performances, to have more presence for people listening on headphones.

I tried for several days and got it working but the radians were off. It was a steep learning curve for me not having ever learned supercollider. I was under the gun and had to move on so I ended developing a 5th order ambisonic system in Csound: https://vimeo.com/367455399

Would love to help out with this if I can free up some time...

telephon commented 4 years ago

@micah-frank-studio sorry, I didn't know you would have needed further help!

@yaxu

the last ~dirt.loadSynthDefs is not needed anymore when you have set your defaultPanningFunction already.

yaxu commented 4 years ago

I did get the AST stuff working, but it was a bit underwhelming. Perhaps I was still doing something wrong, or maybe the headphones I used weren't good enough.

micah-frank-studio commented 4 years ago

I had a similar experience but I thought some of my radians were off and thus the phase relationships. Even with mediocre headphones you should hear something "spatial". So maybe it's the toolkit.

On Mon, Jun 8, 2020 at 11:10 AM Alex McLean notifications@github.com wrote:

I did get the AST stuff working, but it was a bit underwhelming. Perhaps I was still doing something wrong, or maybe the headphones I used weren't good enough.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/musikinformatik/SuperDirt/issues/141#issuecomment-640691212, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARBWJP6AADBSJFHTZM2DD3RVT5PDANCNFSM4ILYSQ2A .

telephon commented 4 years ago

The toolkit is done by people who are really experts as far as I can tell. Probably one needs to know a bit more in order to use it properly? One could ask them for advice.

yaxu commented 4 years ago

I tried with slightly better headphones (over-ear headset designed for gaming, rather than some earbuds that probably came free with an android phone), and the results seem much better.

msp commented 4 years ago

Do you folks know about the Sursound mailing list?

https://www.ambisonic.net/sursound.html

Might be a good place to get advice.

yaxu commented 3 years ago

I am trying this from @munshkr:

(
e = FoaEncoderMatrix.newOmni;
d = FoaDecoderKernel.newCIPIC(12);
)

(
DirtPan.defaultPanningFunction = { |signals, numChannels, pan, mul|
  var sig = FoaEncode.ar(signals.sum * mul, e);
  // angle=pi/2 -> push to plane wave
  sig = FoaPush.ar(sig, angle: pi/2, theta: pan * 2*pi);
  FoaDecode.ar(sig, d);
};

~dirt.loadSynthDefs;
)

The ~dirt.loadSynthDefs is needed, otherwise it's the usual linear pan from left to right, even if I run this between ~dirt = SuperDirt(2, s) and ~dirt.start. I can then hear some spatialisation, however it sounds a bit funny, heavy on the right ear. I wonder if things are somehow getting stacked up so both panning functions are applied?

yaxu commented 3 years ago

Hm maybe I'm imagining it, these sound OK:

d1 $ sound "sd*16"
   # pan (slow 8 saw)
d1 $ weave 16 (pan saw) [sound "sd:15(5,16)",
                         sound "clap:4(3,8)",
                         sound "~ sd:3",
                         sound "jvbass(3,8,2)"
                        ]

Still it feels like a kind of ellipse through my head (up-down and left right, rather than forward-back), rather than sounds going around me :)

telephon commented 3 years ago

Maybe the scaling of the pan parameter is not right? The parameter receives a range [-1..1] (which correspond to tidal's [0...1]), and the correct input range of theta in FoaPush is "Azimuth, in radians.".

So perhaps it should say: sig = FoaPush.ar(sig, angle: pi/2, theta: pan * pi); ?

yaxu commented 3 years ago

Yes that seems much better thanks!

munshkr commented 3 years ago

Maybe the scaling of the pan parameter is not right? The parameter receives a range [-1..1] (which correspond to tidal's [0...1]), and the correct input range of theta in FoaPush is "Azimuth, in radians.".

So perhaps it should say: sig = FoaPush.ar(sig, angle: pi/2, theta: pan * pi); ?

Right. I figured the azimuth went from 0 to 2pi, but after reading some examples in the documentation I found out that it ranged from -pi to pi. I thought that the pan argument ranged from 0-1 too. Thanks!

telephon commented 3 years ago

Excellent. For a more complete implementation, it would be good to differentiate between the different number of input channels, like in the defaultPanningFunction. It is not an easy question how to map many to many, but it is good to have one possible implementation there that can be hacked for different purposes.

totalgee commented 3 years ago

Thanks @munshkr (and @yaxu) for pointing me to this approach with the defaultPanningFunction. It works well with the omni encoder, which is a matrix encoder (followed by push). But if you use one of the ATK kernel encoders (e.g. frequency diffusion or spreader) you will hear some audible glitches on short notes, which I believe is because the panning function and/or the note gets cut off before it's done...or something similar. I assume the kernel encoders work in frequency domain (FFT), so they work with 512, 1024 or 2048-frame chunks, and mustn't get cut off early. (at least, I suspect that's the reason for the subtle but audible glitches -- I'm not familiar enough with the whole signal chain with SuperDirt, orbits, etc.)

It would be nice to have the SuperDirt panning function only do encoding to B-format (ambisonic), then just have a single decoder at the end (whether per-orbit or on the whole SC output). (It would also be better for performance, if you have lots of fast notes!) This is the approach I've been taking when experimenting with this recently -- slapping a FoaDecode at the end of all SC output (after the main audio Group) and assuming the SC output channels (0-3) are encoded in B-format. You can see this and some other of my recent experiments here: https://github.com/totalgee/binaural-livecoding

yaxu commented 3 years ago

Thanks @totalgee !

I have been hitting cpu problems hard with high polyphony, I guess because the panning is calculated separately for each individual sound event. I guess your approach will be a big improvement as you suggest, as the decoding is then only done once. Great! Thanks for sharing your code, I'll have a look.

I am due to make a composition for a multichannel system, but don't know too much about this. I guess it's possible to compose something in four channel surround, and save the result in this 'b format' somehow for sending to the multichannel lab? Due to covid19 restrictions I won't be able to go there myself for a while.

totalgee commented 3 years ago

Yes, if you record (e.g. from SC) the output of the four (in the case of first-order ambisonics) B-format channels, you can then use that as a "golden master" from which it would be possible to render/decode for stereo, 5.1, 7.1, binaural stereo, arbitrary channel/speaker layouts, etc.

(In case you don't know), with higher-order ambisonics (e.g. 2nd-5th) you use more audio channels and processing power, but you gain spatial accuracy/precision. I've also done experiments with the SC-HOA Quark in SuperCollider (not shown in the repo above). I found it a bit more complicated to get working compared to ATK, but I did find the results more "precise" -- I found 3rd order a good compromise of not too heavy CPU usage (requires 16 channels vs 4 for its "B-format") but more precise spatial localization. In that case, I was using the SC-HOA Quark for encoding and transformation, but a VST plugin (from IEM) to do the higher-order decoding, because I found it took less CPU and gave similar or nicer results (for my ears) than the solution included with SC-HOA. I've mostly used this stuff for VR purposes; with head tracking it really comes into its own! Also, I know Joseph and the ATK gang have been working for a very long time on a higher-order version of ATK...but it's still not available (as a release for SC, at any rate).

telephon commented 3 years ago

This sounds vey nice.

Just in case you don't know anyhow: if you want to add a constant processing stage at the end of scsynth or superdirt, there are these options:

totalgee commented 3 years ago

Thanks @telephon, that's good to know about s.tree... Very useful (and undocumented). I knew there was a way to do it (similar to SkipJack for Cmd-period), but haven't set it up before. I'll give it a try.

Regarding the other solution with a global effect (for decoding), I'll try to add an example to my repo when I get a chance; it would be nice to have that "ready to go" for others who want to experiment with binaural audio.

Thanks again!

telephon commented 3 years ago

Thank you – this is a great addition. If you like, maybe you could add a folder to the repository?

yaxu commented 3 years ago

So to produce a b-format for recording, I guess all I have to do is set superdirt to pan across four channels, then just miss out the decode step, i.e.:

DirtPan.defaultPanningFunction = { |signals, numChannels, pan, mul|
    var sig = FoaEncode.ar(signals.sum * mul, e);
    FoaPush.ar(sig, angle: 0.45pi, theta: pan * -pi, phi: 0);
};

This seems to work, I only see three of the channels used, but I guess that's because I'm not using altitude ?

The signals.sum in there - I guess that means stereo samples will be treated as mono.. hmm.

bgold-cosmos commented 3 years ago

It might be interesting to switch on numChannels, use a different encoder for stereo (like FoaEncoderMatrix.newStereo(angle: pi/4), and then push the result around based on pan...

totalgee commented 3 years ago

This seems to work, I only see three of the channels used, but I guess that's because I'm not using altitude ?

Correct. The fourth channel corresponds to the Z/vertical direction. If you're not positioning anything outside the horizontal plane, then it's will stay at zero (information will come from the first (omnidirectional) channel).

totalgee commented 3 years ago

It might be interesting to switch on numChannels, use a different encoder for stereo (like FoaEncoderMatrix.newStereo(angle: pi/4), and then push the result around based on pan...

Exactly. Or, equivalently, you could push the left and right mono channels wherever you want them in space -- FoaEncoderMatrix.newStereo conveniently creates a matrix encoding that does that for you, with a given (fixed) angle separation for the two "virtual speakers". There is also a "super stereo" encoding option, which is a kernel encoder (to be honest, though, not sure you'll hear a big difference ;-).

Example here:

https://github.com/totalgee/binaural-livecoding/blob/49588e2c38f4f1d7fac73feff2bb6fa9a1baf143/examples/supercollider-atk.scd#L114-L123

You don't need to do the "zoom" (or push, which does something different but is similar -- the ATK documentation has diagrams that explain these transforms well) if you're happy with a fixed spread of the stereo sources (45º, 90º or whatever). If you push or zoom the stereo encoded signal all the way (by 0.5pi) then you're back to a unidirectional source. But, when desired, they provide a way of changing the "width" of the encoded stereo field on the fly (e.g. could be exposed as a parameter from Tidal).

yaxu commented 3 years ago

Does someone have hints on how to record in b format, while monitoring on four channel sound (one speaker in each corner, i.e. in a ring)?

yaxu commented 3 years ago

Hmm I thought I could do this by outputting b-format and then using this iem plugin to turn it into quadrophonic:

Screenshot_2020-12-29_16-28-55

It's kind of working, I can see four channels going in, with three of them active (due to lack of altitude) and four coming out.. But if I pan in a circle, speakers 1 and 3 stay at the same level, and speakers 2 and 4 go up and down in opposition to each other.. So it feels like it's panning across and not around. Hm!

yaxu commented 3 years ago

In the end I gave up on that.. Instead I used the keystroke recording in the feedforward editor with one configuration, which I then played back using superdirt with a different multichannel configuration, recorded it as a video with multichannel audio using ffmpeg. This turned out to be a nice approach, because getting the video recorded nicely was technically very fiddly, and it was good to only worry about this after I'd done the live coding itself.

totalgee commented 3 years ago

@yaxu you could probably do something like I show in https://github.com/totalgee/binaural-livecoding/blob/main/examples/setup-superdirt-conv.scd except instead of doing convolution, do ATK encoding (as shown in some of the other examples in that repo) of your four panned "virtual speaker" channels, then (all in the same Ndef) record the four B-format channels using a DiskOut.ar UGen. You also need to ensure the Ndef passes its input on unchanged, so you still hear the original panned four channels on the outputs. You could simplify things by not using an Ndef as I do (because it's convenient to easily change when experimenting), and just use Synths and Groups directly. It might seem a bit complicated to set everything up, but would be one way to do what you want. I don't have time to make an example right now, though...maybe later if you still need it.

Sounds like you found a solution that worked for you, though... ;-)

micah-frank-studio commented 3 years ago

Jumping back in here after about a year of not using ambisonics as all my work in this area was canceled for 2020.

I'm just trying to figure out what the status is. From what I've been able to glean from the discussion, we can monitor/code in binaural? https://github.com/totalgee/binaural-livecoding

does that include altitude? or just horizontal?

And also print to B-format? That combo would be great as it's portable across all systems - so you could do your work in binaural and render B-format - then bring it right to the HOA.