w3c / webcodecs

WebCodecs is a flexible web API for encoding and decoding audio and video.
https://w3c.github.io/webcodecs/
Other
1.01k stars 137 forks source link

Sniffing considerations #169

Open padenot opened 3 years ago

padenot commented 3 years ago

After long periods of time during which having a mime-types were mandatory to render audio or video, the HTML spec now says that sniffing should be implemented (with very specific steps from MIMESNIFF), and is in fact required for example for decodeAudioData in the Web Audio API, since there is no mime type information on a byte buffer (unlike with a URL or a Blob).

Here, we're in a novel situation. There is a bit more chances to have a mime type, because authors have probably (but not necessarily) just demuxed a byte stream, and they must have a good idea of what's in it, but in the end it's really just a series of bytes, for the video or audio case.

For images, less so. Authors are used to sniffing and never really touching the byte stream, so this is a bit of a problem.

Additionally, requiring a mime type means that in terms of layering, web codecs cannot be used to implement AudioContext.decodeAudioData or the <img> tag without having an author-provided implementation of MIMESNIFF.

dalecurtis commented 3 years ago

The decision to drop mime sniffing was made in https://github.com/dalecurtis/image-decoder-api/issues/1

@domenic @jyasskin

chcunningham commented 3 years ago

Summary of discussion in this week's editors call: @padenot is anticipating the feedback that sniffing is required if WebCodecs were to be used to reimplement <video> or <img>. My counter points were: reimplementing <video> requires a number of higher level things (e.g. demuxing) which are not the responsibility of codecs. Additionally, I recommend that anyone storing assets anywhere store some metadata like their mime type to facilitate capability detection via ImageDecoder.isTypeSupported() (avoid the broken image experience).

mathiasbynens commented 3 years ago

Which repo contains the proposed spec text for ImageDecoder.isTypeSupported? The flagged Chromium implementation seems to be promise-based, which differs from isTypeSupported on MediaSource — is this intentional? https://github.com/dalecurtis/image-decoder-api/issues/6

dalecurtis commented 3 years ago

Image types almost universally come with their mime-type attached or are 1:1 with their extension, so just adding sniffing for the handful of cases where the extension and mime-type are unavailable wasn't compelling when we discussed this previously.

The sniffing code for images is all incredibly simple too (just looking at first 16 bytes), so I don't see much loss in having the client implement: https://source.chromium.org/chromium/chromium/src/+/master:third_party/blink/renderer/platform/image-decoders/image_decoder.cc;l=95;drc=043d32542213b601bf5852f6cd5950c949f411fa

chcunningham commented 3 years ago

Which repo contains the proposed spec text for ImageDecoder.isTypeSupported? The flagged Chromium implementation seems to be promise-based, which differs from isTypeSupported on MediaSource — is this intentional? dalecurtis/image-decoder-api#6

ImageDecoder was incubated in its own repository, but is now moving to WebCodecs. The text in PR #152 (preview) reflects the latest plans.

Making isTypeSupported() async is intentional. Lets discuss motivations in your comment on the PR.

padenot commented 3 years ago

Not sniffing would prevent implementing <img> without having to look at the bytes of the resources, which means it's not really possible to layer (as it's always possible to layer by doing lots of work in script). Not all resources are served with a mime type, and not all resources have an extension or even a file name.

For images, if sniffing is (1) easy and (2) frequently necessary for authors to implement, it's then clear this needs to be added in the spec and implementations to make it easy for authors. I'll note that determining that an HEIF is an AVIF (that one can display on some shipping browser today in their stable version) and not an HEIC (that is a very common image format that no browser regardless of their version can display today) is not sniffing a few bytes: https://source.chromium.org/chromium/chromium/src/+/master:third_party/blink/renderer/platform/image-decoders/avif/avif_image_decoder.cc;drc=221e331b49dfefadbc6fa40b0c68e6f97606d0b3;l=504

For videos and audio, sniffing is notoriously non-trivial, but this specification is not about demuxers (in the audio/video case), so this is a non-issue.

dalecurtis commented 3 years ago

Ultimately I defer to @domenic and @jyasskin here since they are most familiar with current policies on sniffing from our side.

As a matter of strategy I'd say we should keep it as required until we actually get complaints or requests to loosen it, since it's always possible to loosen this requirement later without impacting existing clients.

A few is 144 in the linked case and I'd argue you actually only need to look at the major brand (first 12 bytes), but your point is taken that it's definitely not as simple as older formats.

chcunningham commented 3 years ago

Triage note: marking 'extension' as desire for sniffing would probably be signaled by simply omitting the 'type' member from the ImageDecoderInit.

padenot commented 3 years ago

An important point on something a bit tangential https://github.com/w3ctag/design-reviews/issues/633#issuecomment-840606638. I'll note that unlike what is being discussed there, the data is already in clear in the process and inspectable.

baumanj commented 3 years ago

A few is 144 in the linked case and I'd argue you actually only need to look at the major brand (first 12 bytes), but your point is taken that it's definitely not as simple as older formats.

Having had a lot of discussions about what's required in the brands of ISOBMFF files recently, I don't think it is sufficient to limit the check to the major_brand (though that would suffice in most cases). One of the features offered by the compatible brands arrangement is to allow a single file to be legally interperable as different types. While I don't see that as necessarily desirable or common in the case of AVIF, I do think it's technically legal and since there's only one major brand, assuming that avif will occur there and not solely in the compatible_brands array would reject files which are valid per the specs.

dalecurtis commented 3 years ago

Thanks for the insight. compatible_brands is still part of the ftyp at the front of the file, so it doesn't change the difficulty of sniffing implied by my comment though.

baumanj commented 3 years ago

Digging further, 144 bytes isn't a sufficient limit either. It's an arbitrary one in the chromium code. Firefox uses 512 since the size of an ftyp box is unbounded*, and the only limit that the mimesniff spec declares is 1445 bytes in total for sniffing. Updating the mimesniff spec to have a more reasonable limit on MP4 sniffing would be beneficial, but the overall point is that from a security standpoint, it's not reasonable to assume that our sniffing reads will be limited to 12 bytes in all MP4 cases.

* ftyp is a Box, which typically expresses its size as a unsigned 32-bit int, but there are facilities for extending that to a 64-bit value or until the end of the file (See ISO/IEC 14496-12:2020 § 4.2.2). That said, the mimesniff algorithm for MP4 assumes a 32-bit size, which is totally reasonable.