Allow web devs to synchronize branches with tee()?

whatwg / streams

Streams Standard

https://streams.spec.whatwg.org/

Other

1.35k stars 161 forks source link

Allow web devs to synchronize branches with tee()? #1157

Open jan-ivar opened 3 years ago

jan-ivar commented 3 years ago

MediaStreamTrackProcessor exposes a real-time ReadableStream of VideoFrames, e.g. from a camera. It'll drop frames on consumers who don't keep up, to keep data current, and also because it may have a finite pool of GPU-backed frames to vend.

This seems to work well, provided consumers leave highWaterMark at 1.

But with tee(), if one or both branches lag they'll drift from each other, causing one branch to stop being real-time. This causes buffer bloat buildup, eating up a ton of VideoFrames.

JS may work around it by counting frames and trying to synchronize the branches by making them take the same amount of time (by waiting for the slower branch), but this can be tricky to get right (spot the bug).

A tee({synchronized: true}) option that did this for you might be better.

This would help ensure both branches stay real-time, at the cost of both dropping frames if one of them lags.

This cost might seem undesirable, so let's look at two potential use cases for tee():

Encode a high-res and a low-res version of the same VideoFrame in parallel (using e.g. a WebCodecs transform polyfill)
Show the same processed video in self-view (at e.g. 60 fps), as is encoded + sent over WebTransport (at e.g. 15 fps)

{synchronized: true} should solve the first use case out of the box, since the ultimate sink expects both encodings.

The second use case proves trickier, since we don't want to drop frames in the self-view. To do this without buffer-buildup would probably necessitate a frame-dropper transform on the branch destined for WebTransport, to get it down to 15 fps. This would be a transform that consumes higher fps, resolving promises prematurely to do so. There may also be better ways to solve the second use case without tee(), like going through a second MediaStreamTrack.

But we can't prevent someone from using tee() to try to solve the second use case, so we need to consider it. But importantly, a frame-dropper transform seems like it should be able to coexist well with a tee({synchronized: true}) option.

Thoughts?

MattiasBuelens commented 3 years ago

This would help ensure both branches stay real-time, at the cost of both dropping frames if one of them lags.

It's starting to sound like readable streams may not be a good fit for this use case... In the streams model, it is not acceptable to drop chunks in the middle of the stream. From the FAQ:

Readable streams fit best in situations where:

Consumers care about the logical concatenation of all chunks of the stream, such that every value produced is important.

If I understand the draft spec correctly, the MediaStreamTrackProcessor is allowed to drop frames if nobody is actively reading from the stream. But as soon as a frame is enqueued onto the stream, it becomes part of this "logical concatenation" and mustn't be dropped by a branch from tee(), no matter how fast/slow the branch is being consumed. That is, this should always be true:

const [branch1, branch2] = readable.tee();
let chunks1 = [];
let chunks2 = [];
for await (let chunk of branch1) chunks1.push(chunk);
for await (let chunk of branch2) chunks2.push(chunk);
chunks1.forEach((chunk, i) => chunk === chunks2[i]);

Looks like you're running into much of the same issues as WebCodecs did. 😕

jan-ivar commented 3 years ago

But as soon as a frame is enqueued onto the stream it becomes part of this "logical concatenation" and mustn't be dropped by a branch from tee(), no matter how fast/slow the branch is being consumed. That is, this should always be true:
const [branch1, branch2] = readable.tee();
let chunks1 = [];
let chunks2 = [];
for await (let chunk of branch1) chunks1.push(chunk);
for await (let chunk of branch2) chunks2.push(chunk);
chunks1.forEach((chunk, i) => chunk === chunks2[i]);

Agreed, and this would remain true, because I'm not suggesting tee({synchronized: true}) drop chunks, merely that it keep the branches in lockstep, which is desirable for expensive chunks (bikeshed: maybe tee({lockstep: true}) is better?)

I don't believe there's any requirement on transforms to produce the same number of chunks they consume. E.g.

Apart from tee, the streams spec doesn't seem to rule out real-time streams (even defaulting highWaterMarks to 1).

Looks like you're running into much of the same issues as WebCodecs did. 😕

Where does that document mention this issue?

MattiasBuelens commented 3 years ago

Ah, I think I understand. Correct me if I'm wrong though!

Currently, tee() will pull a chunk from the original stream as soon as one of the two branches needs a new chunk. Once that chunk becomes available, we will enqueue it to both branches. If the other branch does not need a new chunk right now, that chunk will end up in its queue and potentially cause a build up.

With tee({ synchronized: true }), we will only pull a chunk from the original stream when both branches need a new chunk. Then, once that chunk becomes available, we enqueue it to both branches. (I'll leave the bikeshedding for later. 😛)

For example:

const [branch1, branch2] = readable.tee({ synchronized: true });
const bufferedBranch1 = branch1.pipeThrough(new TransformStream({}, { highWaterMark: 3 }));
const bufferedBranch2 = branch2.pipeThrough(new TransformStream({}, { highWaterMark: 5 }));

With a regular tee(), we would pull 5 chunks from readable to fill bufferedBranch2 up to its HWM. With the proposed tee({ synchronized: true }), we would only pull 3 chunks from readable, since after that bufferedBranch1 will stop pulling from branch1. In other words: a regular tee pulls as many chunks as the branch with the largest total queue size can hold (and "overfilling" the other branch), whereas this "synchronized tee" only pulls as many as the branch with the smallest total queue size can hold (and "underfilling" the other branch).

This sounds pretty feasible to do in author code. I'll see if I can whip up something to experiment with, and see if that solves your problem. 👨‍🔬

Looks like you're running into much of the same issues as WebCodecs did. 😕

Where does that document mention this issue?

That was assuming that you wanted to drop frames mid-stream, which was a misunderstanding on my part.

MattiasBuelens commented 3 years ago

All right, here's my attempt: https://jsfiddle.net/MattiasBuelens/9gr3zq7b/7/

It's basically a re-implementation of ReadableStreamDefaultTee, but with a modified pull algorithm that waits until both branches have called pull() before proceeding with the actual reader.read().

Here's how that would work in your initial example: https://jsfiddle.net/MattiasBuelens/y439ahnp/2/

dogben commented 3 years ago

I think we want { synchronized: true } to be the default for a ReadableStream from MediaStreamTrackProcessor.

@alvestrand

youennf commented 3 years ago

{ synchronized: true} would slow down the fastest consumer which is undesirable in a realtime video pipeline. The slow consumer also probably wants to process the freshest frame consumed by the fast consumer.

If we look at what happens in existing native pipelines like 'render local + encode local', frames are being dropped as needed by both renderer and encoder.

That was assuming that you wanted to drop frames mid-stream, which was a misunderstanding on my part.

I believe this is what we actually want in our use case.

a frame-dropper transform seems like it should be able to coexist well with a tee({synchronized: true}) option.

I do not see how that would work without custom resynchronisation between the two consumers. We cannot expect one consumer to always be the slow one. Also, as a real example and getting back to 'render local + encode local', rendering may stop pulling frames in case user switches to another tab, but encode is of course expected to continue.

jan-ivar commented 3 years ago

{ synchronized: true} would slow down the fastest consumer which is undesirable in a realtime video pipeline.

It's desirable in the first use case I mentioned in the OP, and shouldn't get in the way of dropping frames downstream.

If we look at what happens in existing native pipelines like 'render local + encode local', frames are being dropped as needed by both renderer and encoder.

Exactly. Dropping frames downstream ("renderer and encoder") is a common strategy. A sink resolves promises at its discretion, so the streams spec isn't in the way there.

What we're talking about here is how things degrade when that fails, and giving applications tools to control this. Drift ({synchronized: false}) seems a lot harder to reason about, manage and recover from. But if an app thinks it can — e.g. it has context (reasons or strategy) to expect its slow branch to only occasionally stutter and always recover quickly — then go for it.

The slow consumer also probably wants to process the freshest frame consumed by the fast consumer.

I don't follow this characterization. If a consumer isn't ready for a frame, then the "freshest" frame is the one it has to wait for, since the one available immediately is no longer fresh.

As a baseline, in the sunny case, where both branches are within the time budget of the source frame rate, both branches wait for the next (fresh) frame (i.e. in sync), and this is entirely normal and not "slow".

youennf commented 3 years ago

It's desirable in the first use case I mentioned in the OP, and shouldn't get in the way of dropping frames downstream.

As I understand it, the first use case is this one:

Encode a high-res and a low-res version of the same VideoFrame in parallel (using e.g. a WebCodecs transform polyfill)

If it is something equivalent to salsify, the idea is to get the output of both versions and pick one. In that case, the application will synchronise the results of the two operations, synchronized=false and synchronized=true will provide the same results. If this is something like simulcast, the frame rate is probably not selected as the min of both, but as a parameter to optimise the allocated bandwidth (low res => lower frame rate typically).

The idea behind tee is that we have different consumers processing samples at their own pace. Buffering is streams solution when consumers have different pace. synchronized=false introduces consumer stalling as the alternative solution. My understanding is we in fact want frame dropping as the solution in that specific problem space.

Exactly. Dropping frames downstream ("renderer and encoder") is a common strategy. A sink resolves promises at its discretion, so the streams spec isn't in the way there.

In both synchronized=true and synchronized=false, the solution to get to the optimal is for the 'slow' consumer to read at the 'fast' consumer pace, and drop frames as needed. The question is then why we should introduce a new option if the actual solution is the same no matter whether option is set to true or false.

Also this solution introduces coupling between the consumers which is something we do not want. This also prevents any backpressure signal from the 'slow' consumer, although the processing pipe might sometimes benefit those signals.

As a baseline, in the sunny case, where both branches are within the time budget of the source frame rate, both branches wait for the next (fresh) frame (i.e. in sync), and this is entirely normal and not "slow".

In that case, synchronized=true or synchronized=false have the same result. In practice though, the baseline is probably more something like 'render + encode', with different processing pace.

Say we use a renderer and an encoder and are using tee as these are two different consumers. Renderer processes at 60 fps, encoder processes at 30 fps. Encoder is instructed to read as fast as possible and drop frames as needed. Suddenly, renderer stops rendering as the local view is no longer visible say. Application will tell the encoder to read frames at its own pace and ask renderer (which is not running) to read frames as fast as possible even though it is not using any of those frames. Or application will have to recreate its own pipeline whenever local view is switched on/off. To remove coupling, streams would need to deal with frame dropping themselves, which I think we are hesitant to do.

Please also look at the downsides of introducing a new option in terms of complexity/learning curve/debuggability.

MattiasBuelens commented 3 years ago

I think we want { synchronized: true } to be the default for a ReadableStream from MediaStreamTrackProcessor.

I would find it quite surprising if the behavior or readable.tee() depended on how the stream was constructed. Currently, consumers can call tee() pretty freely: you might use a bit more memory (if a slow branch builds up a large buffer), but your read()s will still complete just as fast (i.e. backpressure works the same). If some streams were to do a "synchronized" tee() instead, then a consumer may see its read()s getting stuck because the stream was tee'd somewhere else, potentially outside of their own code.

It would also be awkward to implement. We'd have to add an internal flag on the ReadableStream, perhaps even turn this into a third type of ReadableStream (next to "default readable streams" and "readable byte streams"). I don't think this is a good idea.

To remove coupling, streams would need to deal with frame dropping themselves, which I think we are hesitant to do.

Indeed. At that point, I'd say that Streams is not the right API for your use case. 😕

Perhaps you should consider building your own API to process/transform a stream of video frames, with support for synchronization and frame dropping? You could still add APIs that interoperate with Streams "on the border", e.g. a method that converts the output of a MediaStreamTrackProcessor into a ReadableStream of VideoFrames. But you wouldn't have a readable property acting as your "primary output" to feed into a processing pipeline, you would have a different dedicated API for that.

alvestrand commented 3 years ago

One possible solution is to fail usage of raw tee() with these streams.

I see from the discussion of introducing tee() that it was pretty controversial when it was introduced, precisely because it makes it very easy to make patterns of buffering that are hard to reason cleanly about, and some of those patterns are footguns. (In particular, when I experimented with it, I found that tee() does not respect desiredSize on the destination stream; it enqueues stuff no matter what desiredSize is. (Unless I messed up my code, of course.)

It's entirely feasible to write your own tee-like function that does exactly the queueing strategy that's desired, including dropping when appropriate - the standard tee operator + realtime streams + random jank seems like a Really Bad Combination.

alvestrand commented 3 years ago

Streams that drop frames would, of course, be TransformStreams.

jan-ivar commented 3 years ago

Streams before tee() was introduced: fits our use case. Streams after tee() was introduced: no longer fits?

Maybe the problem is tee() and not streams?

youennf commented 3 years ago

Maybe the problem is tee() and not streams?

This would break point 3 of where readable stream fits best, according https://github.com/whatwg/streams/blob/main/FAQ.md:

Readable streams fit best in situations where:
1. You are trying to represent some underlying I/O source of data, especially one for which backpressure is meaningful.
2. Consumers care about the logical concatenation of all chunks of the stream, such that every value produced is important.
3. Your usual use case is a single consumer, with allowances for multi-consumer via teeing.
4. Errors are either final and non-recoverable (socket prematurely closed), or just a specialized type of data (404).

Points 2 and 3 are not met in the context of a realtime video processing pipeline. Point 4 is also debatable: errors are not a type of data and errors are generally recoverable in a video processing pipeline.

alvestrand commented 3 years ago

I have added a proposed new section to the mediacapture-transform spec (a Google proposal):

https://pr-preview.s3.amazonaws.com/w3c/mediacapture-transform/pull/57.html#use-with-multiple-consumers

jan-ivar commented 3 years ago

according https://github.com/whatwg/streams/blob/main/FAQ.md:

The FAQ is trying to be helpful and reflect the (current) spec, but is not authoritative I don't think. If we improve the spec's "fit" for real-time streams (through issues like this one, https://github.com/whatwg/streams/issues/1156 and https://github.com/whatwg/streams/issues/1158), we should update the FAQ.

FWIW the spec's Introduction says "the Streams Standard enables use cases like: • Video effects: piping a readable video stream through a transform stream that applies effects in real time."

Consumers care about the logical concatenation of all chunks of the stream, such that every value produced is important.

As a criteria of exclusion, this seems inaccurate, since consumers are free to drop data:

numbers
  .pipeThrough(new TransformStream({transform: (chunk, ctrl) => chunk % 2 || ctrl.enqueue(chunk)}))
  .pipeTo(new WritableStream({write: c => console.log(c)})); // 0, 2, 4, 6, 8, 10

I also see no prohibition on sources (whether a camera or a WebTransport datagram receiver) dropping data as a solution to receiving back pressure.

Your usual use case is a single consumer, with allowances for multi-consumer via teeing.

I think this fits, modulo this issue. Most use cases will be single consumer; a few might be multi-consumer, like this cropping example with both self-view and video "sending" that works in Chrome (with "Experimental Web Platform features" enabled).

Point 4 is also debatable: errors are not a type of data and errors are generally recoverable in a video processing pipeline.

I doubt camera source errors are recoverable. Do you mean WebCodecs errors? While writing the above fiddle, the error I ran into most was "frame closed", due to VideoFrame's lifetime issues. I solved those with a custom version of tee() that both synchronizes and clones, the way this issue and https://github.com/whatwg/streams/issues/1156 propose.

jan-ivar commented 3 years ago

Say we use a renderer and an encoder and are using tee as these are two different consumers. Renderer processes at 60 fps, encoder processes at 30 fps.

I have this working in Chrome now. Self-view is 30 fps for me which is all my camera can muster, and encoder is 15 fps from induced delay, but the self-view stays smooth, even when the encoder struggles.

Dropping frames in a TransformStream turned out to be trivial:

  function FrameDropper() {
    return new TransformStream({
      transform(frame, controller) {
        if (controller.desiredSize < 2) {
          frame.close();
        } else {
          controller.enqueue(frame);
        }
      }
    }, {highWaterMark: 1}, {highWaterMark: 2});
  }

I couldn't use {highWaterMark: 0}, {highWaterMark: 1} because that would stall due to #1158 so the encoder is a frame behind. Depending on how #1158 is resolved, perhaps we could get to {highWaterMark: 0}, {highWaterMark: 0} even?

I also use @MattiasBuelens's tee({synchronized: true}) polyfill from above, which I think provides a predictable baseline for video work. I modified it to be tee({synchronized: true, structuredClone: true}) because #1156, or Chrome would error about closed frames.