Setting HRTF on panner node will freeze for 800 milliseconds

ggadwa commented 1 year ago

Trying out 0.34, there's another oddity with the Panner Nodes (you'll be getting a lot of in a game testing from me!) Again, thank you for this lib, and I'll try my best to shake out bugs for you.

If you do this:

panner.set_panning_model(PanningModelType::HRTF);

You get a pause (on my AMD/windows home, haven't tested it on my mac yet) or about 800 milliseconds. My physics thread was complaining a lot about this :)

If you use ::EqualPower, it works fine. BTW, no crashing on panner nodes anymore, so that issue has been fixed.

orottier commented 1 year ago

Thanks for the report!

A small audio glitch is expected when the panning model is changed for an active panner, because HRTF for example might delay the sound signal for the left ear if the sound is coming from the right. PanningModel equalpower assumes both ears are at the same physical locations so no delays are present.

However, 800ms is an eternity and this is not expected.

When I run cargo run --release --example spatial and press <Enter> a few times, I notice a 300ms delay on the control thread. Not the render thread by the way - audio playback is fine.

300 ms is quite a lot of time to load the HRTF data, but since this happens on the control thread it should not really be an issue either. I will have a look if is any speedup possible, but perhaps you should review your application and make sure that any interaction with our library is not happening on a thread with realtime constraints.

ggadwa commented 1 year ago

So this one I might not be explaining properly. I went back to chrome (I'll use this as reference form now on, as I should have) and did this:

console.info('start='+performance.now()); let panner = ctx.createPanner(); panner.panningModel='HRTF'; panner.distanceModel='linear'; panner.refDistance=100.0; panner.maxDistance=1100.0; panner.rolloffFactor=1.0; panner.coneInnerAngle=360.0; panner.coneOuterAngle=0.0; panner.coneOuterGain=0.0; panner.positionX.value=0.0; panner.positionY.value=0.0; panner.positionZ.value=0.0; panner.orientationX.value=1.0; panner.orientationY.value=0.0; panner.orientationZ.value=0.0; console.info('end='+performance.now());

In the console:

start=1658.9000000357628 test.html:52 end=1659.199999988079

Note this is on the main javascript thread (man that is one of the worst parts about javascript!), next to no time has passed to start a panner node, or to set the panning model. If there is a delay in the actual sound being played from your threads, that's fine, but in my case, setting HRTF freezes the thread creating the panner -- it doesn't return for 800 ms.

I suspect in most games people are going to be spawning panner nodes all over the places as sounds play in their game and any thread is going to be time sensitive. Nodes are going to have to be created as fast as possible.

b-ma commented 1 year ago

Hey @ggadwa,

I just recreated your example logging some time elapsed here and there and, while this indeed is a bit long, it's two order of magnitude below what your are reporting, that's really strange

use std::time::{Instant};
use web_audio_api::context::{AudioContext, BaseAudioContext};
use web_audio_api::node::{PanningModelType, DistanceModelType};

fn main() {
  let context = AudioContext::default();

  let start = Instant::now();

  let mut panner = context.create_panner();
  panner.set_panning_model(PanningModelType::HRTF);
  panner.set_distance_model(DistanceModelType::Linear);
  panner.set_ref_distance(100.);
  panner.set_max_distance(1100.);
  panner.set_rolloff_factor(1.);
  panner.set_cone_inner_angle(360.);
  panner.set_cone_outer_angle(0.);
  panner.set_cone_outer_gain(0.);
  panner.position_x().set_value(0.);
  panner.position_y().set_value(0.);
  panner.position_z().set_value(0.);
  panner.orientation_x().set_value(1.);
  panner.orientation_y().set_value(0.);
  panner.orientation_z().set_value(0.);

  let duration = start.elapsed();
  println!("Time elapsed: {:?}", duration);
}

I put some log in the set_panner_mode (with > prefix) method as well, and what I got is:

> include bytes 24ns
> create hrir sphere: 2.750661ms
> create hrir state: 5.737993ms
> send msg: 1.472µs
Time elapsed: 8.577523ms

@otto I tested in the spatial.rs example too, and I found around 7ms too (no noticeable glitch in audio neither), how did you got this 300ms value?

ggadwa commented 1 year ago

If this will help, here is my testing code. Note 2 things (1) I wrapped everything inside a window/event loop as that is how my game runs and (2) I'm in debug mode. In this example, on my machine, the elapsed time is > 5 seconds (!!!)

This code is an absolute mess as I'm testing a bunch of stuff at once but it does show the problem, at least for me. This is an AMD windows machine, BTW.

use winit::{event::*, event_loop::{ControlFlow, EventLoop}, window::{Window, WindowBuilder}}; use web_audio_api::{context::{BaseAudioContext, AudioContext}, node::{AudioNode, AudioScheduledSourceNode, PanningModelType, DistanceModelType}}; use std::f32::consts::PI; use std::time::Instant;

pub async fn run() { env_logger::init();

let event_loop = EventLoop::new();
let window =
    WindowBuilder::new()
        .with_title("atom")
        .with_inner_size(winit::dpi::LogicalSize::new(1024.0, 768.0))
        .with_resizable(false)
        .build(&event_loop).unwrap();

// the audio context
let ctx = AudioContext::default();

// run the window loop
event_loop.run(move |event, _, control_flow| {
    match event {
        Event::WindowEvent {
            ref event,
            window_id,
        } if window_id == window.id() => {
            match event {
                WindowEvent::MouseInput {
                    button,
                    state,
                    ..
                } => {
                    if *state==ElementState::Pressed {
                        println!("sound");

                        ctx.listener().position_x().set_value(0.0);
                        ctx.listener().position_y().set_value(0.0);
                        ctx.listener().position_z().set_value(0.0);

                        ctx.listener().forward_x().set_value(0.0);
                        ctx.listener().forward_y().set_value(0.0);
                        ctx.listener().forward_z().set_value(1.0);

                        ctx.listener().up_x().set_value(0.0);
                        ctx.listener().up_y().set_value(1.0);
                        ctx.listener().up_z().set_value(0.0);

                        let length = ctx.sample_rate() as usize;
                        let sample_rate = ctx.sample_rate();
                        let mut src = ctx.create_buffer_source();
                        let flat_wave = vec![1.0; length];
                        let mut buffer = ctx.create_buffer(1, length, sample_rate);
                        buffer.copy_to_channel(&flat_wave, 0);

                        src.set_buffer(buffer.clone());
                        src.set_loop(true);

                        let mut analyser = ctx.create_analyser();
                        let mut bins = vec![0.; analyser.frequency_bin_count()];

                        let now = Instant::now();
                        let mut panner = ctx.create_panner();
                        panner.set_panning_model(PanningModelType::HRTF);
                        panner.set_distance_model(DistanceModelType::Linear);
                        panner.set_ref_distance(100.0);
                        panner.set_max_distance(1100.0);
                        panner.set_rolloff_factor(1.0);
                        panner.set_cone_inner_angle(360.0);
                        panner.set_cone_outer_angle(0.0);
                        panner.set_cone_outer_gain(0.0);
                        panner.position_x().set_value(0.0);
                        panner.position_y().set_value(0.0);
                        panner.position_z().set_value(0.0);
                        panner.orientation_x().set_value(1.0);
                        panner.orientation_y().set_value(0.0);
                        panner.orientation_z().set_value(0.0);
                        println!("panner ms={:?}", now.elapsed());

                        src.connect(&panner);
                        panner.connect(&analyser);
                        src.start();

                        for i in 0..30 {
                            let d = i * 50;
                            panner.position_z().set_value(d as f32);

                            std::thread::sleep(std::time::Duration::from_millis(100));
                            analyser.get_float_time_domain_data(&mut bins);
                            println!("{}>{}", d, bins[0]);
                        }

                    }
                },
                WindowEvent::CloseRequested => *control_flow = ControlFlow::Exit,
                _ => {}
            }
        }
        Event::RedrawRequested(window_id) if window_id == window.id() => {
        }
        Event::RedrawEventsCleared => {
            window.request_redraw();
        }
        _ => {}
    }
});

}

One other thing to point out is I did this in chrome on the same machine:

            console.info('start='+performance.now());
            let panner = ctx.createPanner();
            panner.panningModel='HRTF';
            panner.distanceModel='linear';
            panner.refDistance=100.0;
            panner.maxDistance=1100.0;
            panner.rolloffFactor=1.0;
            panner.coneInnerAngle=360.0;
            panner.coneOuterAngle=0.0;
            panner.coneOuterGain=0.0;
            panner.positionX.value=0.0;
            panner.positionY.value=0.0;
            panner.positionZ.value=0.0;
            panner.orientationX.value=1.0;
            panner.orientationY.value=0.0;
            panner.orientationZ.value=0.0;
            console.info('end='+performance.now());

And that happened in < 1 millisecond.

ggadwa commented 1 year ago

Tried a release build; must faster but still slow at a little under 1 second.

orottier commented 1 year ago

@otto I tested in the spatial.rs example too, and I found around 7ms too (no noticeable glitch in audio neither), how did you got this 300ms value?

Thanks for chiming in. I will have to check again (later today)

The big question is what do we think is acceptable on the control thread? I agree with the sentiment of @ggadwa that large delays (a few ms+ ?) will be disruptive for any kind of application

orottier commented 1 year ago

My measurements are also in --release mode. I did the following:

diff --git a/examples/spatial.rs b/examples/spatial.rs
index 317c027..03e60c4 100644
--- a/examples/spatial.rs
+++ b/examples/spatial.rs
@@ -16,7 +16,7 @@ use web_audio_api::node::{
 //
 // `WEB_AUDIO_LATENCY=playback cargo run --release --example spatial`
 fn main() {
-    env_logger::init();
+    env_logger::Builder::from_default_env().format_timestamp_millis().init();

     let latency_hint = match std::env::var("WEB_AUDIO_LATENCY").as_deref() {
         Ok("playback") => AudioContextLatencyCategory::Playback,
@@ -91,7 +91,8 @@ fn main() {
         } else {
             PanningModelType::EqualPower
         };
+        log::info!("press");
         panner.set_panning_model(p);
-        println!("PanningMode: {:?}", panner.panning_model());
+        log::info!("PanningMode: {:?}", panner.panning_model());
     });
 }
diff --git a/src/node/panner.rs b/src/node/panner.rs
index da12398..30c2f37 100644
--- a/src/node/panner.rs
+++ b/src/node/panner.rs
@@ -551,12 +551,15 @@ impl PannerNode {
             PanningModelType::HRTF => {
                 let resource = include_bytes!("../../resources/IRC_1003_C.bin");
                 let sample_rate = self.context().sample_rate() as u32;
+                log::info!("load sphere");
                 let hrir_sphere = HrirSphere::new(&resource[..], sample_rate).unwrap();
+                log::info!("setup state");
                 Some(HrtfState::new(hrir_sphere))
             }
         };

         self.panning_model = value;
+        log::info!("send msg");
         self.registration
             .post_message(ControlMessage::PanningModel(Box::new(hrtf_option)));
     }

and then measurements are like

RUST_LOG=info cargo run --release --example spatial
....
HRTF enabled, press <Enter> to toggle

[2023-10-16T12:30:42.553Z INFO  spatial] press
[2023-10-16T12:30:42.553Z INFO  web_audio_api::node::panner] send msg
[2023-10-16T12:30:42.553Z INFO  spatial] PanningMode: EqualPower

[2023-10-16T12:30:43.710Z INFO  spatial] press
[2023-10-16T12:30:43.710Z INFO  web_audio_api::node::panner] load sphere
[2023-10-16T12:30:44.015Z INFO  web_audio_api::node::panner] setup state
[2023-10-16T12:30:44.020Z INFO  web_audio_api::node::panner] send msg
[2023-10-16T12:30:44.020Z INFO  spatial] PanningMode: HRTF

[2023-10-16T12:30:45.360Z INFO  spatial] press
[2023-10-16T12:30:45.360Z INFO  web_audio_api::node::panner] send msg
[2023-10-16T12:30:45.361Z INFO  spatial] PanningMode: EqualPower

[2023-10-16T12:30:46.236Z INFO  spatial] press
[2023-10-16T12:30:46.236Z INFO  web_audio_api::node::panner] load sphere
[2023-10-16T12:30:46.543Z INFO  web_audio_api::node::panner] setup state
[2023-10-16T12:30:46.548Z INFO  web_audio_api::node::panner] send msg
[2023-10-16T12:30:46.548Z INFO  spatial] PanningMode: HRTF

so 310 ms to load the sphere and 5ms to setup the state. Interesting...

orottier commented 1 year ago

Looking at the source code of our HRTF library, it may be the case that the entire sphere needs to be resampled before it can be used. My audio context runs at SampleRate(48000), what's yours @b-ma ? We may need to include pre-resampled versions of the database..

orottier commented 1 year ago

Yeah, it's the resampling that takes a lot of time. I get down to 14ms when the sample rate matches the included HRTF-db (44100 Hertz). I will try to see if we can resample only on the first load, and then reuse that for subsequent panner node creations.

orottier commented 1 year ago

https://github.com/orottier/web-audio-api-rs/pull/375 will alleviate some of your pain @ggadwa but I'm keeping this issue open to explore further improvements

ggadwa commented 1 year ago

@otto I tested in the spatial.rs example too, and I found around 7ms too (no noticeable glitch in audio neither), how did you got this 300ms value?

Thanks for chiming in. I will have to check again (later today)

The big question is what do we think is acceptable on the control thread? I agree with the sentiment of @ggadwa that large delays (a few ms+ ?) will be disruptive for any kind of application

I can only talk for myself, but I've made a few game engines (if you look at my github you can see wsjs which is a complete in javascript 3d game engine, there's a "shooter" demo for it too) and that uses webaudio, and the pattern is when a sound is made, the nodes are created at that moment and if it's a non-looping sound, just forgotten and cleaned up by the GC. I'd assume any other game would have the same pattern, so anything over a couple ms would be very disruptive, for sure.

ggadwa commented 1 year ago

Yeah, it's the resampling that takes a lot of time. I get down to 14ms when the sample rate matches the included HRTF-db (44100 Hertz). I will try to see if we can resample only on the first load, and then reuse that for subsequent panner node creations.

So I'm taking chrome as my gold standard and I can tell from testing that there is never a pause (not even on the first panner creation.) I have to wonder if it's not caching for some standard sample rates before any panners are called, possibly at audio context creation, but that's just guessing. It's super efficient, for sure.

If it's a game, I think the first time having a pause is still going to be disruptive as I doubt you'd use a panner outside of the actual game loop, but I'm also not against just spawning one when the game boots if that's a decent workaround.

b-ma commented 1 year ago

My audio context runs at SampleRate(48000), what's yours @b-ma ?

Yup right, it's 44100 by default on my side.

We may need to include pre-resampled versions of the database..

Probably having 44100 and 48000 could be nice yes

Also, I see in the hrtf lib that both HrirSphere and HrtfProcessor implement Clone, maybe using these instead of creating brand new instances would be more performant ? i.e. create the one instance of these per AudioContext and just retrieve clones to the panner nodes?

I get down to 14ms when the sample rate matches the included HRTF-db (44100 Hertz).

This is still a lot compared to my 2-3ms, really strange... particularly knowing that your machine is generally twice faster than mine...

I wonder if it doesn't comes from the usage of log::info! rather than Instant (i.e. when does the time tag is retrieved?) On my side I did this:

PanningModelType::HRTF => {
    use std::time::{Instant};

    let start = Instant::now();
    let resource = include_bytes!("../../resources/IRC_1003_C.bin");
    let duration = start.elapsed();
    println!("include_bytes!: {:?}", duration);

    let sample_rate = self.context().sample_rate() as u32;

    let start = Instant::now();
    let hrir_sphere = HrirSphere::new(&resource[..], sample_rate).unwrap();
    let duration = start.elapsed();
    println!("HrirSphere::new: {:?}", duration);

    let start = Instant::now();
    let sphere = Some(HrtfState::new(hrir_sphere));
    let duration = start.elapsed();
    println!("HrtfState::new: {:?}", duration);

    sphere
}

With the following result:

sampleRate 44100.0
include_bytes!: 23ns
HrirSphere::new: 2.608541ms
HrtfState::new: 4.920023ms
Time elapsed: 7.581619ms

Here, the HrtfState::new is the biggest issue.

I'd assume any other game would have the same pattern, so anything over a couple ms would be very disruptive, for sure.

Yup for sure, at 60fps you have ~16.6ms of budget, but generally lots of physics and rendering to do...

b-ma commented 1 year ago

Sorry if my last post was maybe a bit harsh (or "rusty" :) But I really don't understand how we can go from 7ms (which is nonetheless and factually too much) to few hundreds of milliseconds, there is something really weird going on here...

ggadwa commented 1 year ago

Sorry if my last post was maybe a bit harsh (or "rusty" :) But I really don't understand how we can go from 7ms (which is nonetheless and factually too much) to few hundreds of milliseconds, there is something really weird going on here...

I have a piece of code (it's above somewhere) that always takes about 5 seconds in debug; nobody else is seeing something that bad, so yeah, there's certainly something weird going on, but I think if you guys take a crack at what you are seeing, I can certainly retest a patch (I am new to rust so how to pull from a local repository is something I'll have to figure out, or maybe a test branch.) I can give you a test on both my PC and M2 mac, both on a simple test app and the game itself.

I also have no problem running through one with a ton of println! is you want to do some more timing.

FYI: My sample rate is 48000.

b-ma commented 1 year ago

I am new to rust so how to pull from a local repository is something I'll have to figure out, or maybe a test branch.

Importing a lib locally is quite simple actually, you just have to change your Cargo.toml to give a relative path to the lib, e.g.:

# web-audio-api = "0.34"
web-audio-api = { path = "../web-audio-api-rs" }

orottier commented 1 year ago

I have further improved the caching in #375 to also cache the HRTF-processor. Creation time is now 1ms for subsequent panner nodes. I hope this suffices for now, but I will have a quick look in the upstream hrtf library if we can improve clone times by optimizing the memory layout.

Now for the initial load (which is thus 300ms on my machine), we have a few options:

take the hit on the first panner creation - not ideal because this is very unexpected for the application. This is the current implementation
take the hit on audio context creation. This will increase the AudioContext creation time from 100ms to 300ms and will increase memory consumption a bit for applications that don't need HRTF panning
include more HRIR spheres for common sample rates (44100, 48000, 96000?). This reduces the initial load time to 15ms on my machine, but it does bloat the binary with a few MBs. This provides no adequate solution for applications running in exotic sample rates

Tangentially, I will have a look in the upstream library if loading can be made more efficient.

@b-ma I'm not sure why our machines have unexpected performances, but I think the general picture still stands

b-ma commented 1 year ago

I would say that in term of usage solution 2. is the best one, as the AudioContext is generally created at application startup

https://github.com/mrDIMAS/hrtf/pull/8

Nice catch!

ggadwa commented 1 year ago

I would say that in term of usage solution 2. is the best one, as the AudioContext is generally created at application startup

I'd second that. My audio context creation is spun up at the same time I'm spinning up my web gpu context, that's all expected time loss at the beginning of any application. 300 ms once really isn't that bad; it's just bad when you have to do it 10,000 times and only have a couple ms to spare every time.

And, again: Thanks for all the work! I picked rust to learn specifically because of the wgpu and this library; I want to be as cross platform as possible and I want my interfaces to things like lower level sound and graphics to be as long lived as possible and being a web standard gives me that option.

orottier commented 1 year ago

Alright thanks for the feedback. I will update #375 with

preloading the HRTF data at the first online AudioContext creation
update the upstream hrtf lib with improved performance
a benchmark to track performance over time

orottier commented 1 year ago

I have released v0.35.0 with the improvements

orottier / web-audio-api-rs

Setting HRTF on panner node will freeze for 800 milliseconds #371