w3c / webcodecs

WebCodecs is a flexible web API for encoding and decoding audio and video.
https://w3c.github.io/webcodecs/
Other
1k stars 137 forks source link

Should ImageDecoder IsTypeSupported be (a)synchronous? #213

Closed chcunningham closed 3 years ago

chcunningham commented 3 years ago

Migrating from this comment thread

@mathiasbynens wrote:

Why is this a promise-based API? Can we instead return the boolean synchronously? dalecurtis/image-decoder-api#6

@chcunningham wrote:

This was made promise as we anticipate cases where the UA may not synchronously have the answer. As image formats have started to use video codecs, decoding an image may require instantiating video decoders backed by platform APIs that. A browser architecture may be such that these APIs are called in a separate process, sandboxed for improved security. Supported types then becomes a question for these same APIs, which is implemented via async IPC.

Earlier media capability detection APIs,

@mathiasbynens wrtoe:

I understand that actual image decoding might depend on another process, but answering the question "do I know what to do with this image type at all (without necessarily doing the work)" seems orthogonal to that. Couldn't browsers just maintain a list of supported image types and synchronously check it whenever isTypeSupported is called?

If not, then have we considered changing the name of this API? IMHO it's surprising to give it the same name as an existing API without matching the signature of its return value.

@chcunningham wrote

If the browser entirely relies on the OS to provide the codec, it may not be possible to know statically what codecs are supported (particularly for newer formats). Instead, we may be forced to query OS apis that are adjacent to the actual decoding APIs. Often this involves the same async IPC to a privaledged sandboxed process.

If not, then have we considered changing the name of this API? IMHO it's surprising to give it the same name as an existing API without matching the signature of its return value.

Open to suggestions. I liked this name for its similarity actually. It is performing essentially the same function as its predecessor. I think any confusion would be pretty immediately resolved at dev-time.

@padenot wrote:

If the browser entirely relies on the OS to provide the codec, it may not be possible to know statically what codecs are supported (particularly for newer formats). Instead, we may be forced to query OS apis that are adjacent to the actual decoding APIs. Often this involves the same async IPC to a privaledged sandboxed process.

In my experience, this is mostly true for video, in the sense that it's well possible that the device that allows (say) power or cpu-efficient decoding is simply physically removed, and so that it's impossible to store the capabilities somewhere on the browser for synchronous access.

Do we have any evidence of a similar constraint for images? In our experience, hardware decoding for images has this problem where the setup time drawfs the decoding time, and the setup time need to happen per image (with a possible edge case when lots of images have the same dimensions/format maybe?).

@dalecurtis wrote:

For HEIF I would expect implementors will require hardware support to enable the functionality. Early in process startup, this may not yet be known.

@padenot wrote:

Do you mean HEIC? HEIF is supported in software today by Chrome and Firefox.

@dalecurtis wrote:

Yes, sorry. I keep forgetting that last letter :) In my head AVIF is what we call what Chrome and Firefox have.

@padenot wrote:

Yes, same, but our image folks made the distinction when I asked :-)

Per the TAG (a precedent with the autoplay policy, that took ages to resolve), a promise is to be used only if the information can change during the lifetime of the document. If not, then it's on the implementer to have the information ready. For example, in Gecko, this is part of the information received when a process is created.

If we anticipate that the support for this image format can be dynamic, it's going to be necessary (but annoying to have to do this async, because of an edge case that is not shipped anywhere and where it's not clear if it's going to be shipped anywhere). It would be a strange precedent however.

@dalecurtis wrote:

At least in Chrome, in the event of a gpu process crash, it is possible support for some formats is no longer available.

Otherwise I agree, we're just making this async for symmetry and a hypothetical. Maybe @jernoble or @aboba want to chime in to avoid a decision that would preclude any formats they might want to support.

@jrmuizel wrote:

@dalecurtis what formats aren't supported in Chrome when the GPU process crashes?

@dalecurtis wrote:

In the hypothetical world where a browser only has support for HEIC via a platform decoder, there is a case where repeated GPU crashes may put the browser into a software-rendering/no-gpu mode that prevents access to the platform decoder.

As a practical example, in some cases the platform decoder may have hard limits on the number of instances available and/or may not be working reliably (e.g., it's hanging or crashing too frequently) to the point that it's disabled at runtime. This happens today in Chrome on Android for H.264 support.

@jrmuizel wrote:

Ok, I think it's unlikely that a browser would choose to support an image format and not have a software fallback, but even if such a browser did exist the web author would still need to deal with decoding support going away after they've already checked for support withisTypeSupported().

@dalecurtis wrote:

Practically we don't have software fallback on Android for H.264 today -- so that at least is a real case. In the case where the codec goes away during usage the decoder would trigger a decoding error. The same would occur if the loss occurred between construction and the first decode call.

I think authors would have to handle this case regardless of if iTS is sync or not though. For security reasons (e.g., malicious invocation of the platform decoder), even if software fallback is available, it's unclear that automatic fallback is the right operation.

Ultimately my initial implementation was always synchronous, so if everyone wants this to be sync and isn't swayed by the hypotheticals I don't mind switching it back. I believe @chcunningham only suggested it for symmetry with the rest of the WebCodecs APIs.

chcunningham commented 3 years ago

Ultimately my initial implementation was always synchronous, so if everyone wants this to be sync and isn't swayed by the hypotheticals I don't mind switching it back. I believe @chcunningham only suggested it for symmetry with the rest of the WebCodecs APIs.

I suggested for syntactic symmetry, but also because the underlying motivation seems symmetric. As Dale noted earlier, Chrome does have cases of relying on the underlying platform without a software fallback. These cases are currently limited to video decoding, but images are increasingly using video codecs, so it seems possible that a similar scenario could arise for images in the near future.

chcunningham commented 3 years ago

@youennf, curious to hear your thoughts.

youennf commented 3 years ago

Asynchronous looks safer to me as more and more hardware access is done outside of processes running JavaScript. Also, even if done in-process, querying OS might sometimes require loading libraries dynamically once at first call which might best be done in a background thread. Asynchronous gives some extra flexibility here without hurting too much web developers authoring.

chcunningham commented 3 years ago

@jrmuizel @padenot, thoughts on the above?

aboba commented 3 years ago

@chcunningham Since the Audio/Video Decoder/Encoder isTypeSupported methods are async, consistency would argue that this one should be as well. Synchronous can be made to work if there is another asynchronous method that you can be sure will be called prior to isTypeSupported, so you can query the hardware in that async method. But I don't think that's the case here.

chcunningham commented 3 years ago

@aboba I agree with both of those points.

chcunningham commented 3 years ago

Triage note: marking 'breaking', as this proposal could change a return type. It won't badly break folks that just use await, but awaiting synchronous functions is not ideal and other breaks can still be had (say trying to chain .then()).

Note: this just a triage note. I still support making this async as described earlier.

jernoble commented 3 years ago

Chair hat off; implementer hat on

I don't understand why "one UA might experience a malfunction" necessitates an async API. The system either supports or does not support a given image format; the UA either has or does not have a software fallback for those image formats. Both are knowable statically, long before any page loads, regardless of the implementation details.

As in the case of the Autoplay API, the explicit purpose of making this API return a promise is to make UA implementation easier at the expense of increasing complexity for the page author. This seems backwards to me.

jernoble commented 3 years ago

@aboba said:

@chcunningham Since the Audio/Video Decoder/Encoder isTypeSupported methods are async, consistency would argue that this one should be as well.

Chain hat off; implementer hat on

Consistency cuts both ways; it seems like you could make these APIs consistent by making all of them sync. But I assume you mean isConfigSupported(), not isTypeSupported() here? Those seem to actually require spinning up and querying a decoder and querying detailed properties of a particular instantiation (including providing codec-specific initialization data), which are arguably asynchronous in nature. That doesn't appear to be the case here, as isTypeSupported() merely takes a MIME type.

chcunningham commented 3 years ago

As image codecs are increasingly video codecs, the simple static support for images may evolve to the "arguably asynchronous" pattern you mentioned for video.

jernoble commented 3 years ago

I made my distinction clear above: "arguably inherently asynchronous" is instantiating a decoder and querying for its specific properties. Checking for whether a support for a given image MIME type exists on the system is not inherently asynchronous, as made evident by the currently synchronous nature of other APIs performing the exact same task.

chcunningham commented 3 years ago

Let me bring back a quote from @youennf

Asynchronous looks safer to me as more and more hardware access is done outside of processes running JavaScript. Also, even if done in-process, querying OS might sometimes require loading libraries dynamically once at first call which might best be done in a background thread.

Asynchronous querying the hardware / platform API out of process could be necessary even just to ask if the image mime type is supported. Presently, I agree that probably all UAs implement a built in sw fallback for every supported image codec, but this is not the case for all video codecs. We may not want to permanently enshrine the requirement for sw fallback for future image codecs as the line between image and video blurs.

jernoble commented 3 years ago

Yes, I disagree with @youennf here. Nothing about a given codec being implemented in hardware requires that generating the list be asynchronous.

However, here's something that would change my mind. Do these advanced image formats support extended MIME type strings? If they do, it's almost impossible to generate the entire list of accepted extended MIME types up front, and a query would have to be performed at every step.

dalecurtis commented 3 years ago

Yes, I disagree with @youennf here. Nothing about a given codec being implemented in hardware requires that generating the list be asynchronous.

However, here's something that would change my mind. Do these advanced image formats support extended MIME type strings? If they do, it's almost impossible to generate the entire list of accepted extended MIME types up front, and a query would have to be performed at every step.

No, the MIME strings are just basic image/heic, image/avif, etc -- there's no codecs parameter -- though with things like HEIF maybe we eventually will need one... E.g., 'image/heif; codecs="..."' may be something we want in the future.

aboba commented 3 years ago

@jernoble said:

"Checking for whether support for a given image MIME type exists on the system is not inherently asynchronous, as made evident by the currently synchronous nature of other APIs performing the exact same task."

[BA] I agree with this statement above. But having authored one of those synchronous APIs (WebRT getCapabilities(), and seeing corner cases arise, there is a point at which sync capability APIs no longer work. For example, in WebRTC getCapabilities we return not just whether the codec is supported, but also info on supported profiles, options, etc. This worked just fine until we encountered codecs that had profiles and options that were only implemented in hardware (no sw fallback). At that point, we found that on some hardware it took too long to retrieve all the required info in a synchronous API, but that it was possible to return everything in async APIs (like createOffer()).

The end result is somewhat undesirable. If you call getCapabilities before createOffer(), you may get an incomplete set of capabilities, whereas if you call it afterwards, you will get the complete set.

eric-carlson commented 3 years ago

A potential user benefit of an asynchronous method is that it would allow a UA to throttle requests if it believes they are being made for fingerprinting [1][2].

[1] https://github.com/abrahamjuliot/creepjs/blob/master/modules/media.js#L5 [2] https://privacycheck.sec.lrz.de/active/fp_cpt/fp_can_play_type.html

jernoble commented 3 years ago

Okay, I agree that allowing UAs to throttle requests is a concrete end-user benefit. The alternative to throttling for a synchronous API would have to involve lying about the answers, which could cause breaks for actual use cases.

Given that Eric identified a concrete benefit for the end-user (in addition to the previous benefits for implementors), and one that allows us to better mitigate fingerprinting risks, I can now support making this API async and returning a Promise.

chcunningham commented 3 years ago

Thanks @jernoble, @eric-carlson.

@padenot I think we're near consensus here. Want to check in with you.

chcunningham commented 3 years ago

@padenot friendly ping

padenot commented 3 years ago

That's a rather compelling argument at first indeed, even considering that the list of supported image type is fairly static and uniform across shipping implementations, it's currently moving.

In practice though making this async doesn't solve any problem, developers who want to use this for fingerprinting can simply load minuscule image files (as b64), and try to load it into a hidden image element, and then check either the error or load event.

I've written this simple code, it gives the correct answer in all browsers in < 10ms and is ~2k of source all included without any attempt at minification or other size optimization.

dalecurtis commented 3 years ago

error and load are both asynchronous events that can be throttled, so doesn't @eric-carlson's point still apply? Using img is a bit less tenable if we ever add a codecs parameter to isTypeSupported() -- which seems like a distinct possibility. I.e., image/heif; codecs="av01.0.00M.10.0.010" or image/heif; codecs="hvc1.1.6.L0.12.34.56.78.9A.BC" -- which would take quite a bit longer to enumerate using img

padenot commented 3 years ago

Of course, but I don't think it's possible to throttle them in practice, it's probably not possible to ship a competitive implementation that throttles.

I don't think it would take any longer than a few milliseconds in any case, my example is not even parallelized (uses only a single <img>) and it's less than 10ms for 8 formats, less than 50ms for a thousand when using multiple <img>.

dalecurtis commented 3 years ago

Okay, thanks for clarifying @padenot. At this point you're the lone dissent against asynchronous; unless your arguments have swayed any folks from the Apple side? From the Chrome side, while we prefer asynchronous it's not worth the analysis paralysis so we won't block on it - whatever will get consensus is fine with us.

Do you want to block on this @padenot? I.e., If not I propose we close this out and leave it asynchronous. Conversely if you want to block on this and no one else is opposed lets just close this out as synchronous. If folks are still opposed, can we agree to resolve with a voice vote at the media WG tomorrow?

padenot commented 3 years ago

@jernoble, @eric-carlson ?

In the grand scheme of things, I don't care that much, but it's annoying to have async API for synchronous things, and this is synchronous, as discussed.

dalecurtis commented 3 years ago

Bump for @jernoble @eric-carlson @aboba in case they have any further to add. Otherwise lets plan to resolve as asynchronous.

Let's try to resolves as much of this over the issue tracker ahead of the media WG meeting as possible to ensure timeliness.

Thanks everyone!

dalecurtis commented 3 years ago

Proposed consensus is to resolve as asynchronous. More recent data from @padenot didn't change @jernoble 's position. I'll discuss with @padenot in tomorrows editor call and resolve if there are no further objections.

dalecurtis commented 3 years ago

Editor's call: asynchronous is okay with @padenot (begrudgingly).