w3c / mediacapture-image

Mediastream Image Capture
https://w3c.github.io/mediacapture-image/
Other
74 stars 40 forks source link

Clarify grabFrame timing #212

Open dsanders11 opened 5 years ago

dsanders11 commented 5 years ago

The current implementation of ImageCapture.grabFrame in Chromium will always wait for the next frame from the video stream, which was unexpected. The spec doesn't say anything about timing, but I think it should be updated to clarify. If the intention is indeed to wait for the next frame, then personally I think ImageCapture.grabNextFrame would more accurately convey the intention.

In particular, without clarification in the spec, one implementation may implement it as next frame, one may implement it as current frame, and code written against the spec would act quite different between the two implementations, especially at low frame rates. At 2 FPS the first implementation would always take 500ms to resolve, while the second implementation may only take 5ms to resolve.

I think it would make the most sense to add language to the spec to say that ImageCapture.grabFrame should not wait for the next frame, unless this can be done without undue latency.

Then I think #210 would offer a nice alternative for implementing the current Chromium behavior.

alvestrand commented 5 years ago

Agree that it should be clarified. The problem with grabbing the previous frame is that it's possible that the previous frame is no longer easily accessible (might have gone to display buffers), and may even not exist (in the case of before-first-frame). Is there any implementation that grabs the previous frame? If not, I suggest we clarify the behavior to mean "next frame". Function name changes are breaking to the deployed base, so shouldn't be done lightly.

dsanders11 commented 5 years ago

@alvestrand

The problem with grabbing the previous frame is that it's possible that the previous frame is no longer easily accessible (might have gone to display buffers), and may even not exist (in the case of before-first-frame).

That's not overly complicated, though. I rewrote the Chromium implementation for grabFrame as part of exploring #210, and really all you need to do is keep around a reference to the last frame until the next one comes in. This incurs a small memory usage penalty, but for 1080p it's just under 8 MB per frame, assuming RGBA, and only 3 MB if it's held as I420, like Chromium does it. 4K is a bit more, at 32 MB and 12 MB respectively. If that's too much to stomach, options could be added to the ImageCapture constructor to customize this behavior.

Is there any implementation that grabs the previous frame? If not, I suggest we clarify the behavior to mean "next frame".

Isn't Chromium the only implementation, currently? Isn't one of the benefits of implementing early to learn this sort of stuff? I understand not wanting to change a widely implemented, widely used spec, but in this case I think it's reasonable to be more agile here since it's still new. This spec has only been published as a W3C Working Draft, it's not set in stone. Isn't that the point of the draft terminology, that it may change?

Function name changes are breaking to the deployed base, so shouldn't be done lightly.

It wouldn't need to be a breaking change. Currently the timing of grabFrame is not defined in the spec, so a change to that is 'fair game'. So you could change the behavior of 'grabFrame' to be 'current frame' and add 'grabNextFrame' if that's desired functionality.

may even not exist (in the case of before-first-frame).

This problem exists even when waiting for next frame, because there may not be a next frame, or the next frame may come after an arbitrary long amount of time. In fact, I'd consider it worse, since a stream can only have a 'before-first-frame' once (and it's easy to know this if you're tracking the last seen frame), where as a stream could have a 'no-next-frame' happen at any time. For no first frame you could easily reject until there is a first frame - you can't reject if you're waiting for a next frame which never comes, without making an arbitrary decision.

This is especially true with remote video being sent by WebRTC, or creating a stream from a canvas, or from a video element.

With canvas the stream only sends out a new frame when the content changes, which means if you 'miss the boat' on calling grabFrame, you end up with a promise which will never resolve, or could be a large amount of time later with a frame entirely different than what you expected to get.

With video, if paused, there will be no new frames on the stream, so again, you may end up with a promise that never resolves or takes a long time.

In both of those cases getting the next frame may not be the desired result, and would be confusing behavior to developers. A promise pending on grabbing a frame from a paused video will come out to be an unexpected frame if the video jumps ahead and resumes playing.

This isn't just speculation, there's actually a developer bug for Chromium (which I just noticed today) which looks like it's running into this exact issue, and they're using HTMLMediaElement.captureStream. I've modified their reproduce case to have a 2 second timeout, which the resolved promise will cancel, and if the timeout is never canceled it will declare the promise unresolved. Doing this shows that several promises get left unresolved. Here is the code that developer was trying to use which led to a never resolved promise (trimmed for readability):

new ReadableStream({
  async start(controller) {
    console.log("reader start");
  },
  async pull(controller) {
    if (e.track.enabled === false && e.track.readyState === "ended") {
      controller.close();
    } else {
      const frame = await imageCapture.grabFrame();
      controller.enqueue(frame);
    }
  }
})

There's two potential issues in that code since grabFrame may never resolve. First, according to the ReadableStream spec, it will wait on pull if it returns a promise, before calling pull again. Since they used await, the grabFrame promise never resolving means that pull will never resolve either. The second issue (which is breaking their code) is that when grabFrame is left unresolved there's never a call to controller.enqueue and so controller.close never gets called, which is why that developer opened the Chromium bug.

I consider the wait for next frame behavior dangerous for reasons like this. Promises that may never resolve or reject are bad form and dangerous for developers. Implementations could detect things like the stream muting, or ending, and reject any unresolved promise, but the spec doesn't mention this, obviously, and the Chromium implementation doesn't do it. That behavior also has its own can of worms, as promises that seem flaky and reject unexpectedly will lead to bad code to deal with that.

jontonsoup commented 4 years ago

So does this have a workaround? I'm getting issues where sometimes grabFrame never returns.