Open padenot opened 3 years ago
The decision to drop mime sniffing was made in https://github.com/dalecurtis/image-decoder-api/issues/1
@domenic @jyasskin
Summary of discussion in this week's editors call:
@padenot is anticipating the feedback that sniffing is required if WebCodecs were to be used to reimplement <video>
or <img>
. My counter points were: reimplementing <video>
requires a number of higher level things (e.g. demuxing) which are not the responsibility of codecs. Additionally, I recommend that anyone storing assets anywhere store some metadata like their mime type to facilitate capability detection via ImageDecoder.isTypeSupported() (avoid the broken image experience).
Which repo contains the proposed spec text for ImageDecoder.isTypeSupported
? The flagged Chromium implementation seems to be promise-based, which differs from isTypeSupported
on MediaSource
— is this intentional? https://github.com/dalecurtis/image-decoder-api/issues/6
Image types almost universally come with their mime-type attached or are 1:1 with their extension, so just adding sniffing for the handful of cases where the extension and mime-type are unavailable wasn't compelling when we discussed this previously.
The sniffing code for images is all incredibly simple too (just looking at first 16 bytes), so I don't see much loss in having the client implement: https://source.chromium.org/chromium/chromium/src/+/master:third_party/blink/renderer/platform/image-decoders/image_decoder.cc;l=95;drc=043d32542213b601bf5852f6cd5950c949f411fa
Which repo contains the proposed spec text for ImageDecoder.isTypeSupported? The flagged Chromium implementation seems to be promise-based, which differs from isTypeSupported on MediaSource — is this intentional? dalecurtis/image-decoder-api#6
ImageDecoder was incubated in its own repository, but is now moving to WebCodecs. The text in PR #152 (preview) reflects the latest plans.
Making isTypeSupported() async is intentional. Lets discuss motivations in your comment on the PR.
Not sniffing would prevent implementing <img>
without having to look at the bytes of the resources, which means it's not really possible to layer (as it's always possible to layer by doing lots of work in script). Not all resources are served with a mime type, and not all resources have an extension or even a file name.
For images, if sniffing is (1) easy and (2) frequently necessary for authors to implement, it's then clear this needs to be added in the spec and implementations to make it easy for authors. I'll note that determining that an HEIF is an AVIF (that one can display on some shipping browser today in their stable version) and not an HEIC (that is a very common image format that no browser regardless of their version can display today) is not sniffing a few bytes: https://source.chromium.org/chromium/chromium/src/+/master:third_party/blink/renderer/platform/image-decoders/avif/avif_image_decoder.cc;drc=221e331b49dfefadbc6fa40b0c68e6f97606d0b3;l=504
For videos and audio, sniffing is notoriously non-trivial, but this specification is not about demuxers (in the audio/video case), so this is a non-issue.
Ultimately I defer to @domenic and @jyasskin here since they are most familiar with current policies on sniffing from our side.
As a matter of strategy I'd say we should keep it as required until we actually get complaints or requests to loosen it, since it's always possible to loosen this requirement later without impacting existing clients.
A few is 144 in the linked case and I'd argue you actually only need to look at the major brand (first 12 bytes), but your point is taken that it's definitely not as simple as older formats.
Triage note: marking 'extension' as desire for sniffing would probably be signaled by simply omitting the 'type' member from the ImageDecoderInit.
An important point on something a bit tangential https://github.com/w3ctag/design-reviews/issues/633#issuecomment-840606638. I'll note that unlike what is being discussed there, the data is already in clear in the process and inspectable.
A few is 144 in the linked case and I'd argue you actually only need to look at the major brand (first 12 bytes), but your point is taken that it's definitely not as simple as older formats.
Having had a lot of discussions about what's required in the brands of ISOBMFF files recently, I don't think it is sufficient to limit the check to the major_brand
(though that would suffice in most cases). One of the features offered by the compatible brands arrangement is to allow a single file to be legally interperable as different types. While I don't see that as necessarily desirable or common in the case of AVIF, I do think it's technically legal and since there's only one major brand, assuming that avif
will occur there and not solely in the compatible_brands
array would reject files which are valid per the specs.
Thanks for the insight. compatible_brands
is still part of the ftyp
at the front of the file, so it doesn't change the difficulty of sniffing implied by my comment though.
Digging further, 144 bytes isn't a sufficient limit either. It's an arbitrary one in the chromium code. Firefox uses 512 since the size of an ftyp
box is unbounded*, and the only limit that the mimesniff spec declares is 1445 bytes in total for sniffing. Updating the mimesniff spec to have a more reasonable limit on MP4 sniffing would be beneficial, but the overall point is that from a security standpoint, it's not reasonable to assume that our sniffing reads will be limited to 12 bytes in all MP4 cases.
* ftyp
is a Box
, which typically expresses its size as a unsigned 32-bit int, but there are facilities for extending that to a 64-bit value or until the end of the file (See ISO/IEC 14496-12:2020 § 4.2.2). That said, the mimesniff algorithm for MP4 assumes a 32-bit size, which is totally reasonable.
After long periods of time during which having a mime-types were mandatory to render audio or video, the HTML spec now says that sniffing should be implemented (with very specific steps from MIMESNIFF), and is in fact required for example for
decodeAudioData
in the Web Audio API, since there is no mime type information on a byte buffer (unlike with a URL or aBlob
).Here, we're in a novel situation. There is a bit more chances to have a mime type, because authors have probably (but not necessarily) just demuxed a byte stream, and they must have a good idea of what's in it, but in the end it's really just a series of bytes, for the video or audio case.
For images, less so. Authors are used to sniffing and never really touching the byte stream, so this is a bit of a problem.
Additionally, requiring a mime type means that in terms of layering, web codecs cannot be used to implement
AudioContext.decodeAudioData
or the<img>
tag without having an author-provided implementation of MIMESNIFF.