WebCodecs (again!) - Githubissues

chcunningham commented 3 years ago

Ya ya yawm TAG!

I'm requesting a TAG review of WebCodecs.

An early review was conducted in #433. The API has changed a lot since then and we now have a draft specification. Please see discussion of deadlines (relatively soon) and and "you should also know that" below (upcoming work, open questions).

Thank you for reviewing!

Explainer: https://github.com/WICG/web-codecs/blob/master/explainer.md
Specification URL: https://wicg.github.io/web-codecs/
Tests: https://github.com/web-platform-tests/wpt/tree/master/webcodecs
Security and Privacy self-review²: https://github.com/WICG/web-codecs/blob/master/security-privacy-questionnaire.md
GitHub repo (if you prefer feedback filed there): https://github.com/WICG/web-codecs/issues
Primary contacts (and their relationship to the specification):
- Co-editor: Chris Cunningham (@chcunningham), Google
- Co-editor: Paul Adenot (@padenot), Mozilla
- Media WG co-chair: Mounir Lamouri (@mounirlamouri), Google
- Media WG co-chair: Jer Noble (@jernoble), Apple
Organization(s)/project(s) driving the specification: Google, Mozilla.
Key pieces of existing multi-stakeholder review or discussion of this specification: Co-editors meet regularly one-on-one and biweekly with with representatives from Intel and Microsoft and with members of WebRTC WG (all supportive). Artifacts of these meetings are mostly issues filed in our Github.
External status/issue trackers for this specification (publicly visible, e.g. Chrome Status): https://www.chromestatus.com/feature/5669293909868544

Further details:

[x] I have reviewed the TAG's Web Platform Design Principles
Relevant time constraints or deadlines: Chrome would like to ship this API version 91, which is anticipated to release May 25, 2021. The code freeze for this release will occur around April 14, 2021.
The group where the work on this specification is currently being done: WICG
The group where standardization of this work is intended to be done (if current group is a community group or other incubation venue): Migrating to Media WG
Major unresolved issues with or opposition to this specification: There are no major objections. Our Github tracks a number of open design questions. The most pressing are tagged with milestones.
This work is being funded by: Google/Mozilla (as employers of it's editors/implementers).

You should also know that...

Author feedback from the Chrome origin trial has been very positive and these users are eager to see it ship. This API has generated considerable user excitement (twitter, discourse)
The API is no longer coupled to WhatWG Streams. Rationale here.
We are particularly interested in input on this issue concerning transfer of BufferSource inputs: https://github.com/WICG/web-codecs/issues/104
The previous tag review mentioned ImageDecoding. I will send a PR shortly to add this to the spec: https://github.com/dalecurtis/image-decoder-api

We'd prefer the TAG provide feedback as (please delete all but the desired option): 🐛 open issues in our GitHub repo for each point of feedback

padenot commented 3 years ago

* The code freeze for this release will occur around April 14, 2021.

This seems way too early (about two month) considering the rather fundamental discussions happening around memory allocations and time units, among others (there is a large number of unsolved issues, and some groups of users haven't given feedback yet). Getting those two wrong (and again, there is more to discuss), or not perfectly right will mean the API cannot be used for some of its stated use-cases, or will have performance implications that mean that some use-cases won't be possible on particular devices (notably and predictably, non-high-end devices).

Considering this is a low-level API that is supposed to be the bottom-most layer of the entire media decoding and encoding stack of the web platform, I would be sad to not get this right, especially considering the expectations of developers in various domains.

chcunningham commented 3 years ago

@padenot, apologies for the surprise - I thought we were more aligned on timing. I agree we have a number of issues to resolve before shipping, but I think 2 months sounds workable. Its not my intention to rush this out the door. Lets discuss more in our upcoming call and we can revise the timeline as needed.

Timing aside, this is still a great time to gather TAG feedback. Reviewers, please proceed :)

cynthia commented 3 years ago

Strange! I swear I wrote early feedback on this yesterday, but apparently I was imagining I did or I got distracted and forgot to press the green button. Abridged summary of what I wanted to write:

Explainer and spec disagree on AudioFrame vs AudioPacket?
VideoFrame has destroy/clone but AudioFrame|Packet does not - why?
In VideoFrame, wild guess is that plane is intended to give the original framebuffer (e.g. in YCbCr, as multiple grayscale images?) while the image bitmap method is expected to return something interoperable with the rest of the platform?

Still need to spend more time with this.

chcunningham commented 3 years ago

Thanks so much @cynthia

Explainer and spec disagree on AudioFrame vs AudioPacket?

Just updated to use AudioFrame.

VideoFrame has destroy/clone but AudioFrame|Packet does not - why?

Sorry, this part needs work. The plan is for both interfaces to have matching clone() and close() methods, removing destroy(). The semantics of clone() and close() will be akin to incrementing/decrementing a ref count on the underlying media data. I should send a PR by end of week. I'll update this thread when its ready.

In VideoFrame, wild guess is that plane is intended to give the original framebuffer (e.g. in YCbCr, as multiple grayscale images?)

Your idea is roughly correct. VideoFrame gives access to the raw pixel data via its Planes interface. For YCbCr you would get a plane for each of Y, Cb, and Cr. For the I420 pixel format, the Cb and Cr planes are subsampled. The spec will soon mention additional pixel formats. For ex: RGB (1 plane) and NV12 (Y plane + interleaved CbCr plane).

while the image bitmap method is expected to return something interoperable with the rest of the platform?

Yes, though we have some pending updates to mention here as well. We initially envisioned VideoFrame -> ImageBitmap -> Canvas as a primary means of painting VideoFrames. More recently, we intend to support painting of VideoFrames directly via WebGL tex(Sub)Image(2D|3D)() and Canvas drawImage(). The Chrome work for this is tracked here. We'll file PRs/bugs on the relevant specs shortly.

We still plan to support VideoFrame -> ImageBitmap for other interop, but VideoFrame.createImageBitmap() will be removed from the spec in favor of window.createImageBitmap(videoFrame). The former was just easier for us to prototype.

chcunningham commented 3 years ago

Also, I to highlight the time units discussion @padenot mentioned. https://github.com/WICG/web-codecs/issues/122

This and other top priority issues are all assigned the 2021 Q1 milestone. https://github.com/WICG/web-codecs/milestone/2

Input welcome!

chcunningham commented 3 years ago

The previous tag review mentioned ImageDecoding. I will send a PR shortly to add this to the spec: https://github.com/dalecurtis/image-decoder-api

This PR is now out here: https://github.com/WICG/web-codecs/pull/152

A preview of the Image Decoding sections can be viewed here: https://pr-preview.s3.amazonaws.com/WICG/web-codecs/pull/152.html#image-decoding

chcunningham commented 3 years ago

@cynthia @kenchris, just checking in. How is the review going? Have you had a chance to discuss https://github.com/WICG/web-codecs/issues/104 and https://github.com/WICG/web-codecs/issues/122?

I know its a big spec and the ImageDecoder portion is new as of last week. We really appreciate your work!

chcunningham commented 3 years ago

@cynthia @kenchris, friendly ping. I'm hoping to hear from you soon so we can respond to your feedback before we ship. I've delayed the Chrome timeline by one milestone (now with code freeze around May 20), but we'll want to make any substantial changes well in advance.

chcunningham commented 3 years ago

Also, I to highlight the time units discussion @padenot mentioned. w3c/webcodecs#122

@padenot @aboba @sandersdan and I discussed this offline and resolved to use integer microseconds. Please see a summary of how that decision was made here: https://github.com/w3c/webcodecs/issues/122#issuecomment-833788313

chcunningham commented 3 years ago

@kenchris @cynthia - here's another issue that TAG could help us resolve

Should WebCodecs be exposed in Window environments? https://github.com/w3c/webcodecs/issues/211

kenchris commented 3 years ago

Generally, I think the points in https://github.com/w3c/webcodecs/issues/122#issuecomment-833788313 make a lot of sense.

As I understand we could always add an alternative attribute with values represented by DOMHighResTimestamp. With that in mind, I support you using a different type (here unsigned microseconds).

@dbaron also made the point (in email thread) that the timestamps you are defining are within the time of the video or audio stream/file, and is independent of the playback rate as that doesn't change the timestamp but just causes it to advance at a different rate

Both David and I agree that if that is understood correctly, it is pretty reasonable that it is a different type, differently-defined.

kenchris commented 3 years ago

Regarding exposing to Window environments, we are very sympathetic to @youennf's comment

It seems safer though to gradually build the API. Exposing to Worker at first covers major known usecases and allows early adopter through workers or JS shims to start doing work in Window environment. This might help validating the model is right also for window environments.

cynthia commented 3 years ago

@chcunningham @padenot, Sorry this took so long!

@kenchris @rhiaro and I discussed this in our F2F. Here is a summary of our discussion.

1) Temporal representation, we've discussed this and an integer representation seems to be the most adequate given the tradeoffs and risks that are associated with using a real representation. 2) Window or Worker? Our take on the discussion is that we should start with Worker and see if there are enough use-cases and demand out there that warrant it to be exposed to Window. 3) Transfer / detach? We read the discussion and making the transfer opt-in seems reasonable - although we did lean more towards implicit transfer and make it so that opt-out is explicit. (Basically, zero garbage unless requested) But there is the risk of this behavior being inconsistent with the rest of the platform.

ImageDecoder was an interesting review. API-design-wise, this felt a lot more like a video decoder API that "supports" images (not surprising given that this is exactly what it is), but we did have some mixed feelings about the ergonomics of the API. What mechanism would allow the user to say, decode an image and put it into an HTMLImageElement? (This seemed like something that people might do)

dalecurtis commented 3 years ago

ImageDecoder was an interesting review. API-design-wise, this felt a lot more like a video decoder API that "supports" images (not surprising given that this is exactly what it is), but we did have some mixed feelings about the ergonomics of the API. What mechanism would allow the user to say, decode an image and put it into an HTMLImageElement? (This seemed like something that people might do)

If you didn't see it, https://github.com/dalecurtis/image-decoder-api/blob/master/explainer.md#providing-image-decoders-through-the-videodecoder-api talks a bit about the reasons for why we didn't just add 'image codecs' to VideoDecoder.

I think it's unlikely someone would want to interoperate with the image element in that way -- we certainly haven't had any requests for it. Canvas is instead the natural rendering point for authored content like this and folks reimplementing aspects of HTMLImageElement with Canvas+ImageDecoder seems more likely. What we have also had requests for is something like a 'image-decoder-worklet' where folks can bring their own image decoders - but that's a bit orthogonal to what this API is trying to do and again is likely served more simply by canvas.

chcunningham commented 3 years ago

@kenchris @cynthia thank you so much for the review

1) Temporal representation, we've discussed this and an integer representation seems to be the most adequate given the tradeoffs and risks that are associated with using a real representation.

SG, I'll update the spec shortly.

2) Window or Worker? Our take on the discussion is that we should start with Worker and see if there are enough use-cases and demand out there that warrant it to be exposed to Window.

In the WG call Tuesday it was suggested that we could check with participants in the Chrome origin trial to see about use cases for window-exposed interfaces. Should take about a week. I'll provide updates on the issue. If the WG cannot come to consensus by Chrome's v1 launch, we can fallback to worker-only exposure while the discussion continues.

3) Transfer / detach? We read the discussion and making the transfer opt-in seems reasonable - although we did lean more towards implicit transfer and make it so that opt-out is explicit. (Basically, zero garbage unless requested) But there is the risk of this behavior being inconsistent with the rest of the platform.

We considered opt-out for the same reasons and were sensitive to the consistency risk. We also concluded that transfer is often undesirable/infeasable in a few common scenarios like passing ArrayBufferView's (particularly those that are backed by the WASM heap). We felt this made automatic transfer even riskier, as the behavior would vary based on the type of the provided BufferSource.

chcunningham commented 3 years ago

@cynthia @kenchris

Window or Worker? Our take on the discussion is that we should start with Worker and see if there are enough use-cases and demand out there that warrant it to be exposed to Window.

In the WG call Tuesday it was suggested that we could check with participants in the Chrome origin trial to see about use cases for window-exposed interfaces.

The replies from our user survey are now summarized in this comment. Please let me know if this changes your recommendation.

Also, for other points mentioned above (ImageDecoder, temporal representation, transfer / detach), please let me know if the replies above have adequately addressed the concern. On temporal representation, the spec is now updated as described.

cynthia commented 3 years ago

@chcunningham

The replies from our user survey are now summarized in this comment. Please let me know if this changes your recommendation.

Since the users have spoken, it seems like it would make sense to expose this to Window - unless there is strong opposition from the group. (Even if there is opposition, there seems to be a clear preference here so I think it's worth negotiating towards the direction of Window exposure.)

Also, for other points mentioned above (ImageDecoder, temporal representation, transfer / detach), please let me know if the replies above have adequately addressed the concern. On temporal representation, the spec is now updated as described.

I personally believe so, we'll discuss this in a plenary call and close this if everyone agrees. Thanks a lot for being patient during the long process!

hober commented 3 years ago

Since the users have spoken, it seems like it would make sense to expose this to Window - unless there is strong opposition from the group.

The WG met the other day, and it was clear (at least to me) that there isn't consensus in the group on whether or not to expose this to Window.

(Even if there is opposition, there seems to be a clear preference here so I think it's worth negotiating towards the direction of Window exposure.)

I like the idea of starting with just exposing it on Worker. It's always easier to add additional exposure in the future than it is to remove it. Best to start small and iterate over time than to start out with too wide exposure & find you regret it in the future.

dalecurtis commented 3 years ago

During the WG call today, @youennf expressed concern that he's unsure of the official TAG position on window vs worker. Presumably because @hober seems to have contradicted @cynthia's (presumed official TAG) position above. Can y'all affirm the official TAG position?

hober commented 3 years ago

During the WG call today, @youennf expressed concern that he's unsure of the official TAG position on window vs worker. Presumably because @hober seems to have contradicted @cynthia's (presumed official TAG) position above. Can y'all affirm the official TAG position?

Hi,

Neither @cynthia nor my comment is "the TAG position", we each just expressed our personal opinions.

dalecurtis commented 3 years ago

Thanks for the clarity, that's a bit confusing from our perspective then. We'll keep that in mind for future comments. Should we expect an official TAG position here (presumably labeled as such)?

cynthia commented 3 years ago

We can do that if that helps - generally we tend to lean towards what users demand/prefer, in this particular case it's also probably worth clarifying if @hober's position is with or without the implementor hat on.

(I'm not an implementor, so I'm fairly neutral on this)

jan-ivar commented 3 years ago

... the users have spoken ...

A survey of participants who jumped on and invested resources in an experimental Chrome origin trial, is perhaps a skewed sample of users to rely on for a question of whether to exercise caution when it comes to exposing APIs prematurely.

chcunningham commented 3 years ago

@cynthia, having a clear TAG position here would be helpful to us in resolving the discussion. We are quite stuck.

A survey of participants who jumped on and invested resources in an experimental Chrome origin trial, is perhaps a skewed sample of users to rely on for a question of whether to exercise caution when it comes to exposing APIs prematurely.

We polled these users at the suggestion of the WG chair. In our view it seems correct that this savvy group of innovators and early adopters should have a strong voice in shaping the API.

jan-ivar commented 3 years ago

We are not "stuck". We have broad consensus to expose this API in workers immediately. There still exists concerns, trepidation, and questions around the urgency of additionally exposing this API on Window at this time.

We generally reserve "users" to describe end-users, not web developers. End-users deserve well-designed applications that work without jank across devices of all abilities, and regardless of user agent. I think the WG with its current stance (which it is in the process of attempting to assert through CfC) is being cautious with the tools it exposes to web developers in different environments, and when, to help ensure this outcome for end-users.

chcunningham commented 3 years ago

We generally reserve "users" to describe end-users, not web developers. End-users deserve well-designed applications that work without jank across devices of all abilities, and regardless of user agent.

I respect that. I think we should also consider web developers as advocates for their users.

I think the WG with its current stance (which it is in the process of attempting to assert through CfC) is being cautious with the tools it exposes to web developers in different environments, and when, to help ensure this outcome for end-users.

At this time the WG does not have a stance. The members of the WG are in disagreement.

cynthia commented 3 years ago

Thank you all for your patience, we hope the feedback we provided on the issue was useful at reaching a resolution. We believe that the feature itself would be a useful addition to the platform and would like to see this move forward. Thanks again for the patience and the apologies for the mixed messages. We'll discuss this in the rollups today during our VF2F and close.

w3ctag / design-reviews

WebCodecs (again!) #612