Early TAG review request for Playout Statistics API for WebAudio

Hernqvist commented 3 months ago

こんにちは TAG-さん!

I'm requesting a TAG review of Playout Statistics API for WebAudio.

There is currently no way to detect whether WebAudio playout has glitches (gaps in the played audio, which typically happens due to underperformance in the audio pipeline). There is an existing way to measure the instantaneous playout latency using AudioContext.outputLatency, but no simple way to measure average/minimum/maximum latency over time. With this API, we want to propose a way to be able to measure the delay of that audio and the glitchiness of the audio.

Explainer¹ (minimally containing user needs and example code): https://github.com/WICG/web_audio_playout
User research: none
Security and Privacy self-review²: https://docs.google.com/document/d/1wGv_mr7Lgg2w-6PuKDrcScoa8IvYAOW3PMTFW85O3Gw/edit
GitHub repo (if you prefer feedback filed there):
Primary contacts (and their relationship to the specification):
- Fredrik Hernqvist (Hernqvist), Google
- Palak Agarwal (palak8669), Google
- Guido Urdaneta (guidou), Google
- Olga Sharonova (o1ka), Google
Organization/project driving the design: Google
External status/issue trackers for this feature (publicly visible, e.g. Chrome Status): https://chromestatus.com/feature/5172818344148992

Further details:

[X] I have reviewed the TAG's Web Platform Design Principles
The group where the incubation/design work on this is being done (or is intended to be done in the future): WICG
The group where standardization of this work is intended to be done ("unknown" if not known): W3C Audio Working Group
Existing major pieces of multi-stakeholder review or discussion of this design:
Major unresolved issues with or opposition to this design:
This work is being funded by: Google

We'd prefer the TAG provide feedback as (please delete all but the desired option):

💬 leave review feedback as a comment in this issue and @-notify Hernqvist, palak8669

martinthomson commented 3 months ago

Hi @Hernqvist, we're looking at #843 (Audio Render Capacity) and are trying to understand how this approach fits with that. In particular, I'm interested in how this approach deals with the compute side channel leaks. On face value, the information that this API proposes is higher fidelity, if post-hoc, which carries some amount of risk.

Hernqvist commented 3 months ago

Hi @martinthomson! We believe that this proposed API gives less detailed information about CPU load than #843 (Audio Render Capacity). Since audio should always run on realtime threads, general high CPU load should not impact audio. When CPU load is high enough to impact realtime threads to the point of causing glitches, the Render Capacity API should already show underruns and very high load.

As for the risk of side channeling, we have investigated this risk more closely since filing the issue. We have found that the API we are proposing is not necessary to set up an audio glitch-based side channel. One way to set up this side channel using only currently implemented APIs is the following:

Website 1 creates multiple computationally expensive AudioContexts. I tested this with 3 AudioContexts that each contained one OscillatorNode and 4000 BiquadFilterNodes.
Website 2 creates a single AudioContext with a ScriptProcessorNode that measures the time between consecutive calls to process.
Website 2 will observe longer maximum intervals between AudioProcessingEvents while Website 1 is active. Website 1 can now send messages to Website 2 by creating and stopping AudioContexts.

I have tested this scheme (on Mac) on Chrome, Safari and Firefox and it works on all of them (though you might have to change the number of AudioContexts or Nodes a bit depending on browser). So this side channel already exists, and we do not believe it is made worse by the proposed API.

matatk commented 2 months ago

We're trying to understand the difference between this and #843 - this seems to be geared towards statistics, whereas RenderCapacity seems more about adaptivity, but they both seem to be giving similar information - have you explored whether they could be combined?

We'd like to see some further documentation around the abuse cases relating to the side channel that you describe.

Do you have any information on the position of other implementers on this proposal?

Hernqvist commented 1 month ago

Hi @matatk! The RenderCapacity API (#843) and the Playout Statistics API serve different information and for different purposes. As you say, the RenderCapacity API is about seeing the trend of CPU usage and adapting the playout graph as needed. It also exposes one kind of audio glitch through underrunRatio, which happens when the WebAudio graph is too complex for the CPU to deliver audio on time.

The Playout Statistics API serves information about the end-user audio experience, which includes glitches from the whole audio pipeline and end-to-end latency. These general audio glitches can happen for many different reasons including:

High cpu usage to the point where it affects realtime threads
Platform audio layer/driver glitches
Web applications failing to deliver audio on time (due to slow WebAudio graphs, for example)

This means that the Playout Statistics API gives much less information about CPU usage than the RenderCapacity API, because RenderCapacity lets us know about gradual levels of CPU usage, while the Playout Statistics API only lets us know if there are actual audio glitches, one possible reason for which is critical CPU usage. This critical CPU usage is already detectable in other ways, such as delayed audio callbacks.

If we combined the APIs, then the Playout Statistics API would also have to give RenderCapacity information. If it turns out that the RenderCapacity API is too invasive, or that we want to guard the RenderCapacity API behind some additional permission, then it's better that the APIs are separate.

To clarify about the side channel: The preexisting side channel we mentioned only allows cooperating sites to communicate with each other with low precision and high latency, and only if they are open at the same time on the same device. The idea is that if one site creates critical amounts of cpu pressure on realtime threads, this can be observed by other sites in a variety of ways, including delayed audio callbacks and audio glitches. This method does not, however, allow anyone to gain information about other processes that are not actively cooperating. CPU overload can be detected using glitches, but the glitches could also mean other things like platform layer glitches or that the WebAudio graph is slow. We therefore believe that this doesn't create any real user risk, and we believe that we would not add any new risk with the introduction of this API.

We have not talked to other implementers about this proposal, we would like to pass early TAG review first.

matatk commented 1 week ago

Thank you for all the info in your last comment, @Hernqvist, and sorry for my slow reply. We are discussing the spec as a whole still, but for now I have a pointer regarding the side channel that may be relevant.

The Compute Pressure API had a similar potential side channel, and some mitigations were advised, following a PING review; here are the details: w3c/compute-pressure#197

martinthomson commented 1 day ago

@Hernqvist, that piece you have here about side channels probably belongs in the explainer. Having it here is helpful, but this issue will be closed and therefore hard to find. (The same is possibly true for some of the other notes here.)

w3ctag / design-reviews

Early TAG review request for Playout Statistics API for WebAudio #939