w3c / media-production-workshop

W3C Workshop on Web and Media production
https://www.w3.org/2021/03/media-production-workshop/
21 stars 11 forks source link

Latency measurement and low-latency #31

Open tidoust opened 2 years ago

tidoust commented 2 years ago

Inconsistent latency across browsers/OS, both generally speaking as well as when the computer is running on battery power. And latency is not exposed everywhere.

Features in specs but not implemented across all browsers:

Possible spec gaps:

Ongoing work in the Audio Working Group to create a new API that allows apps to select the audio output device for an audio context. Theoretically, this will guarantee the code path that minimizes the output latency.

Raised in:

petersalomonsen commented 2 years ago

I also wonder if any browsers plan to support e.g. ASIO on Windows and Jack or ALSA directly on Linux, which are technologies that works well for desktop apps to provide low latency.

Is this something that even could be considered for web api specs? The possibility for a web app to choose using ASIO, Jack, ALSA etc.

padenot commented 2 years ago

Inconsistent latency across browsers/OS, both generally speaking as well as when the computer is running on battery power. And latency is not exposed everywhere.

Features in specs but not implemented across all browsers:

* Input latency info (`MediaStreamTrackSetting`)

* Output latency info (`WebAudio`)

Possible spec gaps:

* Specs on input and output latency - full paths?

* `MediaStreamSourceNode` - adds latency?

This one is important. For example, in Firefox, MediaStreamAudio{Source,Destination}Node don't add any latency except when crossing clock domains, with a twist: the first input and output device are always reclocked implicitly (e.g. using aggregate devices on macOS), and the subsequent devices in the same document will cross a clock domain, so latency will be added. But if (e.g.) the first input device is closed, and there is another device in use in the same document, it's going to be implicitly reclocked and the latency will disappear.

This scheme is to optimize for the (overwhelmingly common) case where there is a single audio input device and a single audio output device, and allows having super low latency and high performance with no change of drifting.

Also, when running AudioContexts at different sample-rate and they are connected via MediaStreams, a high quality resampler is inserted in the audio path, with its own latency, in addition to the buffering necessary to accommodate for the different buffer sizes of the rendering graphs (if e.g. multiple audio output devices are in use, which isn't in any spec right now).

Finally, when an AudioContext is running, and the default audio output device changes, the latency will change, both because the new device might have lower or higher latency (the most common is going from/to a Bluetooth device to a wired device), and the AudioContext might run at a sample-rate that is not native for the device, so again a high quality resampler will be inserted in the audio path. And more importantly, on some OSes, the global latency of a system-level audio stream can be globally overriden by another program running on the machine, so that's another possibility for change.

All in all, this means that the latency of a MediaStreamTrack or an AudioContext can change during its lifetime, and therefore the need for an onlatencychange event (or something) should be considered.

In Firefox, that implements AudioContext.outputLatency, the new latency figures will be available on device change, but there is no way to know about this change short of polling. For this reason, querying this attribute has been made essentially free, so polling regularly is a viable option (but cumbersome).

Ongoing work in the Audio Working Group to create a new API that allows apps to select the audio output device for an audio context. Theoretically, this will guarantee the code path that minimizes the output latency.

Indeed, if the device id is passed in at the creation of the AudioContext, to ensure that it's running at the right sample-rate.

padenot commented 2 years ago

I also wonder if any browsers plan to support e.g. ASIO on Windows and Jack or ALSA directly on Linux, which are technologies that works well for desktop apps to provide low latency.

The ASIO SDK is proprietary, so open-source browsers can't easily ship something with it (last I checked). JACK and ALSA can be used in Firefox today, but are not as well maintained as the Pulse backend, they do however receive community support and we review and ship fixes to those audio backends. I hear the JACK backend should work on other OSes, and this could be a bridge to ASIO on Windows. I haven't tried this myself.

Is this something that even could be considered for web api specs? The possibility for a web app to choose using ASIO, Jack, ALSA etc.

This seems a bit advanced and specific. An implementation could try to find a Jack server an ASIO driver, and device to use this if, e.g. and AudioContext has been created with a latencyHint of "interactive".

petersalomonsen commented 2 years ago

I also wonder if any browsers plan to support e.g. ASIO on Windows and Jack or ALSA directly on Linux, which are technologies that works well for desktop apps to provide low latency.

The ASIO SDK is proprietary, so open-source browsers can't easily ship something with it (last I checked). JACK and ALSA can be used in Firefox today, but are not as well maintained as the Pulse backend, they do however receive community support and we review and ship fixes to those audio backends. I hear the JACK backend should work on other OSes, and this could be a bridge to ASIO on Windows. I haven't tried this myself.

Is this something that even could be considered for web api specs? The possibility for a web app to choose using ASIO, Jack, ALSA etc.

This seems a bit advanced and specific. An implementation could try to find a Jack server an ASIO driver, and device to use this if, e.g. and AudioContext has been created with a latencyHint of "interactive".

I think it would be good to be able to list available audio devices/drivers in the API and so make it possible to choose it from a web-app. For pro music / media production you may want to control which driver / device to use, and not have it automatically decided based on latency hint. E.g. in JavaSound you have this possibility (I was working with a DAW implemented in Java earlier), and just as the midi API lets you choose midi device it would be great to be able to choose the audio device/driver.

ulph commented 2 years ago

@tidoust

Ongoing work in the Audio Working Group to create a new API that allows apps to select the audio output device for an audio context. Theoretically, this will guarantee the code path that minimizes the output latency.

This of course excellent, but what about the input device and input latency? I do notice https://github.com/w3c/media-production-workshop/issues/32.

The roundtrip (input+output) latency is important for audio recording tasks.

@padenot Very good points about the dynamic nature of the audio paths, and your suggestion of an event(s) makes a lot of sense. Events in plural here since the input and output paths live separate lives spec wise as it stands.

Some further thoughts on input latency (in the webaudio perspecive); MediaStreamAudio{Source,Destination}Node not adding latency in Firefox (for the usual use case) is brilliant but it can't really be specced.

How about if we could query / subscribe the node for latency changes up until that point (in WebAudio's clock domain)?

padenot commented 2 years ago

Very good points about the dynamic nature of the audio paths, and your suggestion of an event(s) makes a lot of sense. Events in plural here since the input and output paths live separate lives spec wise as it stands.

Input is slightly different in the sense that if a device become unavailable, the MediaStreamTrack is stopped, and it is necessary to explicitely re-open another device. This is of course for privacy reasons, we can't start using another audio input like without the user's explicit consent. If the track is stopped and there has been an event called "devicechange" fired at navigator.mediaDevices.

How about if we could query / subscribe the node for latency changes up until that point (in WebAudio's clock domain)?

There's been some talk about having a latency parameter on AudioNodes before. The latency is measurable by measuring the time it takes to fire an impulse, with multiple MediaStreamAudio{Source,Destination}Node, but it really is awkward. It's also bound to change, e.g. if the implementation is compensating for a clock skew.

I any case something needs to happen for sure, we need to find the most useful way to expose those numbers.

ulph commented 2 years ago

as briefly discussed on the w3c session 1, what about the detectability of accuracy of these latency figures? (a mouthfull)

padenot commented 2 years ago

I haven't been able to programmatically detect that this is the case short of feeding an output device back into an input device in a controlled environnment, in the general case, I'd be happy to know that this is feasible.

On some of my machines it's very clear, just by ear, that the numbers reported by the OS is wrong. Most often the case for input iirc.

ulph commented 2 years ago

If the OS/drivers is lying, or lacking the APIs to probe, that's of course a problem. Thinking freely here, but is there any way to shift the industry to implementing these types of apis correctly? I acknowledge it's very hard...

Other than that, just checking in to see if there's been any progress towards implementing the specs?

padenot commented 2 years ago

I know that Chromium has made a lot of progress in output latency reporting, I think the APIs are implemented now and are shipping in 102.