[Track Stats API] track.getFrameStats() allocates memory, adding to the GC pile

jan-ivar commented 1 year ago

track.getFrameStats() (getStats after #97) creates a new JS object with properties to be read, adding to the GC pile every time it's called, which may be dozens of times per second, per track.

Feedback from @padenot is that real-time JS apps try to avoid garbarge collection whenever possible, and that allocating an object just to read values seems inherently unnecessary. Other ways to read real-time values exist in the Media WG, which we may wish to consider. E.g. audioContext.outputLatency.

Why it allocates

The existing async API queues a single task when the application requests data, creating a garbage-collectable dictionary to hold the results. In contrast, a synchronous getter, we thought, would require the UA to queue tasks continuously to update its main-thread internal slot, just in case the JS app decides to call it, even if it never does.

Ways to avoid it

The Media WG shows there are other ways. outputLatency is synchronous without an internal slot. In Firefox, it uses a lock, but can be implemented without. The Media WG decided to violate § 5.2. Preserve run-to-completion semantics here, which I don't like, but that seems trivial to remedy by caching reads.

To further reduce overhead, we could put getters on a dedicated interface instead of on the track itself, and have a lazy getter for that (sub) interface attribute. TL;DR:

const {deliveredFrames, discardedFrames, totalFrames} = track.videoStats;

const {
  totalFrameDuration,
  droppedFrameDuration,
  droppedFrameEvents,
  framesCaptured,
  totalCaptureDelay
} = track.audioStats; // TypeError if wrong kind

An implementation could do a read-one-read-all approach to maintain consistency between values. WDYT?

padenot commented 1 year ago

1: We neither want to PostTask on every update or have to use a mutex. Because we want to grab multiple metrics at the same time, I don't think there is a third option so in practise we would be forced to implement this with a mutex and a cache which seems undesirable. Agree/disagree?

Disagree, it's trivial in this day and age to communicate an arbitrary number of items in a coherent manner, lock-free, when the data is in the same address space. It's not very hard to do this across processes either.

2: The "overhead from number of updates per second" and "we may be forced to do IPC" concerns that I have stem from wanting to allow implementer flexibility and the fact that I think it is very likely that more metrics will be added in the future. This could include having to process metrics of more tracks (e.g. 54 remote tracks, leading to thousands of updates per second) or new metrics not possible to collect without IPC. Not forcing sync is a way to not paint ourselves into a corner in the future. Thoughts?

There's already, by definition, communication associated with the data of those tracks. Adding metadata to this data is ~free, the data is either:

using shmem, to skip copies across processes, common for big video frames -- either using advanced mechanisms or e.g. an spsc ring buffer on a shm, in which case this metadata can be tacked to the data
using e.g. pipes, just by sending the metadata alongside the data

During the interim I heard support for async API (Youenn, Harald) but only Mozilla in favor of the sync API (Jan-Ivar). I did also present a compromise, which would be the sync API but to throttle the update frequency of metrics pushes but I didn't hear any takers for that.

Adding IPC traffic for something that is already free to send to the relevant processes or in fact even already being sent to the processes is not a good solution.

The solution become expensive when this data is being requested, and is free otherwise.

The alternative proposal (a sync API) is always free.

padenot commented 1 year ago

Regarding "this is already available", I think there are cases like AudioContext where the piggybacking only happens if you've opted in to it by using AudioContext, and then there are cases like Media Playout Quality where you can't opt out but these counters also don't necessarily need to run at hundres or thousands of updates per second. Nobody would complain if their playout counters "only" refreshed at the requestAnimationFrame frequency, clearly.

You also need latency values when running e.g. an echo canceler. It needs to know about the real latency, including any buffering at the application level. You need output latency when playing any video content that has sound, for A/V sync purposes. You need to know about video frame drops to communicate this to js, so that it can observe it and potentially serve you different media. It's not all about giving potentially late and infrequent numbers to content.

Looking at the links of real product and the workshop talk I've mentioned above, it's clear that it's important for application developers to get precise and reliable information for their app to work.

As you say, it's possible to update those counters rarely, but why bother? The information is there, it's free, you can update them at the correct rate, and it solves real problems.

padenot commented 1 year ago

Browsers may do different tradeoffs between perf vs. accuracy. If we go with sync, we would probably need to provide clear guidance on refresh rate or we might end up in web compat issues.

There is no "refresh rate", this is about real-time update of metadata related to data used by content processes. There's no risk of web compat issues, because the numbers represent what's actually happening.

And again, this information is not available for free. For instance, Safari rendering processes cannot currently compute some of the metrics you mention, for instance audio drops. We can of course send this additional information to the rendering process but I do not think we would like to do this preemptively nor posting tasks for no other reasons than updating those counters. This would require introducing a new stats object which would trigger active gathering of the stats.

Firefox will set some of the proposed numbers to 0, always, because its architecture prevents some problems that require some of those metrics, and it's fine.

This is not about "preemptively sending data" here, it's about adding metadata to data that already need to be in content processes, so they can be exposed.

I think we agreed perf is not the main issue here. From a web dev point of view, the async API seems easier to understand than introducing a new object with sync getters. From a browser implementor point of view, an async API is also probably easier to implement.

async is always more complex than sync, and by definition the data is old when it's received by the content process in the case of async method calls. From a browser point of view, I'd believe that depending on the architecture it's easier to implement the async method, just add a couple cross process method calls, and that's it. This is not the case for all implementations.

The important point here is that the priority is in this precise order:

users
web developers
browser implementors
...

and that it's not particularly hard to send some metadata with the data anyway.

padenot commented 1 year ago

Here we want to synchronize audio input and audio output, and potentially visuals, so we need all numbers frequently.

Audio input / audio output will be done in web audio presumably, so in a worklet where there is no MediaStreamTrack. The proposed API does not seem well suited for this use case AFAIUI.

If you're processing microphone data, there is a MediaStreamTrack.

I understand the desire to have an API to get that info and that this info would be updated every time there is a new audio chunk to process. Should it be a separate API (that could include input and output latency estimates)?

MediaStreamTrack is the object that let you interact with input devices (microphones, cameras, etc.). There's no other location where this info could live is there?

padenot commented 1 year ago

In this case should extra IPC be added for the sake of sync API or would the sync API just need to return frequently updated information that may be less frequent than each audio chunk?

The sync API returns the latest available metadata, that it has received alongside the actual data.

padenot commented 1 year ago

Does the lockless ring buffer block the producer if the consumer has not read from it yet? If we're talking about something like this and its performance remarks

A simpler design for what we're looking at here is triple buffering https://searchfox.org/mozilla-central/source/media/libcubeb/src/cubeb_triple_buffer.h, this is how we implement (and plan to implement, since we haven't implemented all of this) this in Firefox. Additionally, we'll be adding metadata to the data we already communicate across process. But triple-buffering allows fanning out the data to other threads once it's been received via IPC.

henbos commented 1 year ago

OK. Well then I'm convinced it can be implemented efficiently and that this is just a matter of API shape versus implementation effort.

youennf commented 1 year ago

MediaStreamTrack is the object that let you interact with input devices (microphones, cameras, etc.). There's no other location where this info could live is there?

MediaStreamTrack is just an opaque object that connects sinks and sources, it is not the object that does processing. Processing happens at sinks/sources: AudioContext, HTMLMediaElement, MediaStreamTrackProcessor... Exposing the best information in those places makes sense since this is where processing actually happens.

Exposing a sync API at MediaStreamTrack level would further push web developers towards doing realtime data processing in main thread contexts. This would be at the detriment of the user. Exposing an async API at MediaStreamTrack level does not face those issues. It also covers the envisioned use cases while allowing to support later on realtime processing use cases that we might discover in the future.

I also still disagree with the fact that a sync API is free, since some of this data is currently not sent to WebProcesses. In the future, optimisations or features like isolated streams might further move away from sending such data to WebProcesses.

henbos commented 12 months ago

To be discussed at TPAC, but having gathered implementation experience about this API and thinking about it some more, I am no longer concerned about the performance issues.

That is, if people see value in having this API be synchronous, I see no reason not to do that. I filed #105 and created a PR.

henbos commented 11 months ago

The TPAC resolution was to merge #106 making this synchronous, plus filing follow-up issues such as #108 and #107.

I may file a separate issue to revisit the "dictionary vs interface" question with regards to GC to make sure this is what we really want, but as things stands the PR that will merge is using [SameObject] interface, and I have code in chromium to update our WebIDL and implementation to match this PR.

Are there any other issues from this thread that needs further discussion, such as audio metrics definitions? @padenot @jan-ivar Can we file separate issues about this if more audio-specific discussions are needed? It's hard to keep track of all the discussions in this issue so I'd like to break it down into pieces and close this issue as resolved when #106 merges.

dontcallmedom-bot commented 11 months ago

This issue was mentioned in WEBRTCWG-2023-09-12 (Page 55)

jan-ivar commented 11 months ago

Closed by #106.

w3c / mediacapture-extensions

[Track Stats API] track.getFrameStats() allocates memory, adding to the GC pile #98

Why it allocates

Ways to avoid it