Closed chcunningham closed 3 years ago
* The code freeze for this release will occur around April 14, 2021.
This seems way too early (about two month) considering the rather fundamental discussions happening around memory allocations and time units, among others (there is a large number of unsolved issues, and some groups of users haven't given feedback yet). Getting those two wrong (and again, there is more to discuss), or not perfectly right will mean the API cannot be used for some of its stated use-cases, or will have performance implications that mean that some use-cases won't be possible on particular devices (notably and predictably, non-high-end devices).
Considering this is a low-level API that is supposed to be the bottom-most layer of the entire media decoding and encoding stack of the web platform, I would be sad to not get this right, especially considering the expectations of developers in various domains.
@padenot, apologies for the surprise - I thought we were more aligned on timing. I agree we have a number of issues to resolve before shipping, but I think 2 months sounds workable. Its not my intention to rush this out the door. Lets discuss more in our upcoming call and we can revise the timeline as needed.
Timing aside, this is still a great time to gather TAG feedback. Reviewers, please proceed :)
Strange! I swear I wrote early feedback on this yesterday, but apparently I was imagining I did or I got distracted and forgot to press the green button. Abridged summary of what I wanted to write:
Still need to spend more time with this.
Thanks so much @cynthia
Explainer and spec disagree on AudioFrame vs AudioPacket?
Just updated to use AudioFrame.
VideoFrame has destroy/clone but AudioFrame|Packet does not - why?
Sorry, this part needs work. The plan is for both interfaces to have matching clone() and close() methods, removing destroy(). The semantics of clone() and close() will be akin to incrementing/decrementing a ref count on the underlying media data. I should send a PR by end of week. I'll update this thread when its ready.
In VideoFrame, wild guess is that plane is intended to give the original framebuffer (e.g. in YCbCr, as multiple grayscale images?)
Your idea is roughly correct. VideoFrame gives access to the raw pixel data via its Planes interface. For YCbCr you would get a plane for each of Y, Cb, and Cr. For the I420 pixel format, the Cb and Cr planes are subsampled. The spec will soon mention additional pixel formats. For ex: RGB (1 plane) and NV12 (Y plane + interleaved CbCr plane).
while the image bitmap method is expected to return something interoperable with the rest of the platform?
Yes, though we have some pending updates to mention here as well. We initially envisioned VideoFrame -> ImageBitmap -> Canvas as a primary means of painting VideoFrames. More recently, we intend to support painting of VideoFrames directly via WebGL tex(Sub)Image(2D|3D)() and Canvas drawImage(). The Chrome work for this is tracked here. We'll file PRs/bugs on the relevant specs shortly.
We still plan to support VideoFrame -> ImageBitmap for other interop, but VideoFrame.createImageBitmap() will be removed from the spec in favor of window.createImageBitmap(videoFrame). The former was just easier for us to prototype.
Also, I to highlight the time units discussion @padenot mentioned. https://github.com/WICG/web-codecs/issues/122
This and other top priority issues are all assigned the 2021 Q1 milestone. https://github.com/WICG/web-codecs/milestone/2
Input welcome!
The previous tag review mentioned ImageDecoding. I will send a PR shortly to add this to the spec: https://github.com/dalecurtis/image-decoder-api
This PR is now out here: https://github.com/WICG/web-codecs/pull/152
A preview of the Image Decoding sections can be viewed here: https://pr-preview.s3.amazonaws.com/WICG/web-codecs/pull/152.html#image-decoding
@cynthia @kenchris, just checking in. How is the review going? Have you had a chance to discuss https://github.com/WICG/web-codecs/issues/104 and https://github.com/WICG/web-codecs/issues/122?
I know its a big spec and the ImageDecoder portion is new as of last week. We really appreciate your work!
@cynthia @kenchris, friendly ping. I'm hoping to hear from you soon so we can respond to your feedback before we ship. I've delayed the Chrome timeline by one milestone (now with code freeze around May 20), but we'll want to make any substantial changes well in advance.
Also, I to highlight the time units discussion @padenot mentioned. w3c/webcodecs#122
@padenot @aboba @sandersdan and I discussed this offline and resolved to use integer microseconds. Please see a summary of how that decision was made here: https://github.com/w3c/webcodecs/issues/122#issuecomment-833788313
@kenchris @cynthia - here's another issue that TAG could help us resolve
Should WebCodecs be exposed in Window environments? https://github.com/w3c/webcodecs/issues/211
Generally, I think the points in https://github.com/w3c/webcodecs/issues/122#issuecomment-833788313 make a lot of sense.
As I understand we could always add an alternative attribute with values represented by DOMHighResTimestamp
. With that in mind, I support you using a different type (here unsigned microseconds).
@dbaron also made the point (in email thread) that the timestamps you are defining are within the time of the video or audio stream/file, and is independent of the playback rate as that doesn't change the timestamp but just causes it to advance at a different rate
Both David and I agree that if that is understood correctly, it is pretty reasonable that it is a different type, differently-defined.
Regarding exposing to Window environments, we are very sympathetic to @youennf's comment
It seems safer though to gradually build the API. Exposing to Worker at first covers major known usecases and allows early adopter through workers or JS shims to start doing work in Window environment. This might help validating the model is right also for window environments.
@chcunningham @padenot, Sorry this took so long!
@kenchris @rhiaro and I discussed this in our F2F. Here is a summary of our discussion.
1) Temporal representation, we've discussed this and an integer representation seems to be the most adequate given the tradeoffs and risks that are associated with using a real representation. 2) Window or Worker? Our take on the discussion is that we should start with Worker and see if there are enough use-cases and demand out there that warrant it to be exposed to Window. 3) Transfer / detach? We read the discussion and making the transfer opt-in seems reasonable - although we did lean more towards implicit transfer and make it so that opt-out is explicit. (Basically, zero garbage unless requested) But there is the risk of this behavior being inconsistent with the rest of the platform.
ImageDecoder was an interesting review. API-design-wise, this felt a lot more like a video decoder API that "supports" images (not surprising given that this is exactly what it is), but we did have some mixed feelings about the ergonomics of the API. What mechanism would allow the user to say, decode an image and put it into an HTMLImageElement? (This seemed like something that people might do)
ImageDecoder was an interesting review. API-design-wise, this felt a lot more like a video decoder API that "supports" images (not surprising given that this is exactly what it is), but we did have some mixed feelings about the ergonomics of the API. What mechanism would allow the user to say, decode an image and put it into an HTMLImageElement? (This seemed like something that people might do)
If you didn't see it, https://github.com/dalecurtis/image-decoder-api/blob/master/explainer.md#providing-image-decoders-through-the-videodecoder-api talks a bit about the reasons for why we didn't just add 'image codecs' to VideoDecoder.
I think it's unlikely someone would want to interoperate with the image element in that way -- we certainly haven't had any requests for it. Canvas is instead the natural rendering point for authored content like this and folks reimplementing aspects of HTMLImageElement with Canvas+ImageDecoder seems more likely. What we have also had requests for is something like a 'image-decoder-worklet' where folks can bring their own image decoders - but that's a bit orthogonal to what this API is trying to do and again is likely served more simply by canvas.
@kenchris @cynthia thank you so much for the review
1) Temporal representation, we've discussed this and an integer representation seems to be the most adequate given the tradeoffs and risks that are associated with using a real representation.
SG, I'll update the spec shortly.
2) Window or Worker? Our take on the discussion is that we should start with Worker and see if there are enough use-cases and demand out there that warrant it to be exposed to Window.
In the WG call Tuesday it was suggested that we could check with participants in the Chrome origin trial to see about use cases for window-exposed interfaces. Should take about a week. I'll provide updates on the issue. If the WG cannot come to consensus by Chrome's v1 launch, we can fallback to worker-only exposure while the discussion continues.
3) Transfer / detach? We read the discussion and making the transfer opt-in seems reasonable - although we did lean more towards implicit transfer and make it so that opt-out is explicit. (Basically, zero garbage unless requested) But there is the risk of this behavior being inconsistent with the rest of the platform.
We considered opt-out for the same reasons and were sensitive to the consistency risk. We also concluded that transfer is often undesirable/infeasable in a few common scenarios like passing ArrayBufferView's (particularly those that are backed by the WASM heap). We felt this made automatic transfer even riskier, as the behavior would vary based on the type of the provided BufferSource.
@cynthia @kenchris
Window or Worker? Our take on the discussion is that we should start with Worker and see if there are enough use-cases and demand out there that warrant it to be exposed to Window.
In the WG call Tuesday it was suggested that we could check with participants in the Chrome origin trial to see about use cases for window-exposed interfaces.
The replies from our user survey are now summarized in this comment. Please let me know if this changes your recommendation.
Also, for other points mentioned above (ImageDecoder, temporal representation, transfer / detach), please let me know if the replies above have adequately addressed the concern. On temporal representation, the spec is now updated as described.
@chcunningham
The replies from our user survey are now summarized in this comment. Please let me know if this changes your recommendation.
Since the users have spoken, it seems like it would make sense to expose this to Window - unless there is strong opposition from the group. (Even if there is opposition, there seems to be a clear preference here so I think it's worth negotiating towards the direction of Window exposure.)
Also, for other points mentioned above (ImageDecoder, temporal representation, transfer / detach), please let me know if the replies above have adequately addressed the concern. On temporal representation, the spec is now updated as described.
I personally believe so, we'll discuss this in a plenary call and close this if everyone agrees. Thanks a lot for being patient during the long process!
Since the users have spoken, it seems like it would make sense to expose this to Window - unless there is strong opposition from the group.
The WG met the other day, and it was clear (at least to me) that there isn't consensus in the group on whether or not to expose this to Window.
(Even if there is opposition, there seems to be a clear preference here so I think it's worth negotiating towards the direction of Window exposure.)
I like the idea of starting with just exposing it on Worker. It's always easier to add additional exposure in the future than it is to remove it. Best to start small and iterate over time than to start out with too wide exposure & find you regret it in the future.
During the WG call today, @youennf expressed concern that he's unsure of the official TAG position on window vs worker. Presumably because @hober seems to have contradicted @cynthia's (presumed official TAG) position above. Can y'all affirm the official TAG position?
During the WG call today, @youennf expressed concern that he's unsure of the official TAG position on window vs worker. Presumably because @hober seems to have contradicted @cynthia's (presumed official TAG) position above. Can y'all affirm the official TAG position?
Hi,
Neither @cynthia nor my comment is "the TAG position", we each just expressed our personal opinions.
Thanks for the clarity, that's a bit confusing from our perspective then. We'll keep that in mind for future comments. Should we expect an official TAG position here (presumably labeled as such)?
We can do that if that helps - generally we tend to lean towards what users demand/prefer, in this particular case it's also probably worth clarifying if @hober's position is with or without the implementor hat on.
(I'm not an implementor, so I'm fairly neutral on this)
... the users have spoken ...
A survey of participants who jumped on and invested resources in an experimental Chrome origin trial, is perhaps a skewed sample of users to rely on for a question of whether to exercise caution when it comes to exposing APIs prematurely.
@cynthia, having a clear TAG position here would be helpful to us in resolving the discussion. We are quite stuck.
A survey of participants who jumped on and invested resources in an experimental Chrome origin trial, is perhaps a skewed sample of users to rely on for a question of whether to exercise caution when it comes to exposing APIs prematurely.
We polled these users at the suggestion of the WG chair. In our view it seems correct that this savvy group of innovators and early adopters should have a strong voice in shaping the API.
We are not "stuck". We have broad consensus to expose this API in workers immediately. There still exists concerns, trepidation, and questions around the urgency of additionally exposing this API on Window at this time.
We generally reserve "users" to describe end-users, not web developers. End-users deserve well-designed applications that work without jank across devices of all abilities, and regardless of user agent. I think the WG with its current stance (which it is in the process of attempting to assert through CfC) is being cautious with the tools it exposes to web developers in different environments, and when, to help ensure this outcome for end-users.
We generally reserve "users" to describe end-users, not web developers. End-users deserve well-designed applications that work without jank across devices of all abilities, and regardless of user agent.
I respect that. I think we should also consider web developers as advocates for their users.
I think the WG with its current stance (which it is in the process of attempting to assert through CfC) is being cautious with the tools it exposes to web developers in different environments, and when, to help ensure this outcome for end-users.
At this time the WG does not have a stance. The members of the WG are in disagreement.
Thank you all for your patience, we hope the feedback we provided on the issue was useful at reaching a resolution. We believe that the feature itself would be a useful addition to the platform and would like to see this move forward. Thanks again for the patience and the apologies for the mixed messages. We'll discuss this in the rollups today during our VF2F and close.
Ya ya yawm TAG!
I'm requesting a TAG review of WebCodecs.
An early review was conducted in #433. The API has changed a lot since then and we now have a draft specification. Please see discussion of deadlines (relatively soon) and and "you should also know that" below (upcoming work, open questions).
Thank you for reviewing!
Further details:
You should also know that...
We'd prefer the TAG provide feedback as (please delete all but the desired option): 🐛 open issues in our GitHub repo for each point of feedback