whatwg / mimesniff

MIME Sniffing Standard
https://mimesniff.spec.whatwg.org/
Other
109 stars 44 forks source link

Missing image codecs #143

Open jonsneyers opened 3 years ago

jonsneyers commented 3 years ago

There are image codecs that are supported in (some) browsers where those browsers (probably?) do magic sniffing to figure out the media type. The following ones do not seem to be covered by the current mimesniff spec:

annevk commented 3 years ago

Thanks! AVIF is also tracked by https://github.com/AOMediaCodec/av1-avif/issues/149.

I'm not sure what to do about JPEG 2000. The idea is that this matches what everyone (plans to) align(s) on. Perhaps the warning that (proprietary) extensions to this algorithm are dangerous should be even more prominent.

Lack of identifying JPEG 2000 probably results in a download (when navigating) and a decoder error (through <img>), which both seem okay so I don't think it needs to be added to other browsers.

veluca93 commented 3 years ago

I agree it doesn't necessarily need to be added to other browsers (not that any other browsers supports JPEG 2000 anyway, AFAIU), but what should the mime sniff spec do about it? By the way, is the mime sniff spec intended to be a "this is how sniffing should happen for a given format if that format is supported", or is it a "this is how sniffing should happen period" thing?

annevk commented 3 years ago

The latter. (A big problem it has is lack of a test suite and someone with the time to work on that and modernize the algorithms.)

veluca93 commented 3 years ago

I see - so i.e. a browser that doesn't decode BMP is still supposed to sniff for it, even if then it cannot do anything with the resulting information? It's slightly surprising to me, but I assume there are good reasons :)

annevk commented 3 years ago

It might depend on the context, but for certain cases (e.g., ORB) that would be important, yes.

veluca93 commented 3 years ago

I have to admit I don't fully understand what ORB does, but I can see that it allows or disallows some things depending on the mime sniff algorithm, so I understand why that algorithm should be consistent across browsers :)

jonsneyers commented 3 years ago

If I understand correctly, the main security problem is that secret non-image-data might be exposed cross-origin via side-channel attacks if that non-image-data happens to be misidentified as an image. If that's the case, then I think there are two possible approaches:

I think it's safer to make the mimesniff spec more exhaustive (meaning that if a browser does not actually match the JPEG 2000 signature because it cannot decode JPEG 2000, it's technically not implementing the spec but it doesn't make a difference because it would be a broken image anyway, just for a different reason) rather than limiting it to the set of universally supported codecs (which is kind of the current approach, where all browsers are technically not implementing the spec because they sniff more signatures than those, and it does make a difference because they do actually decode images they're not supposed to decode).

So I am in favor of adding the signatures of JPEG 2000, JPEG XR, JPEG XL and AVIF, so the behavior is documented and it is actually possible to avoid real-mimesniff-matching signatures in sensitive data. Note that these signatures are all quite short and simple (all should be identifiable using an exact match of 2 to 12 bytes).

baumanj commented 3 years ago

Note that these signatures are all quite short and simple (all should be identifiable using an exact match of 2 to 12 bytes).

In probably 99% of cases that's true, but to recognize valid ISOBMFF-based images which have more than one compatible brand, it's not sufficient so assume that the major brand will be the one we expect. I wouldn't oppose restricting the mimesniff algorithm such that it only supported sniffing the major brand, but that's not the way it works now. Given that the current algorithm would fail to correctly sniff other exotic-but-valid ISOBMFF files, I think restricting it further to capture the common cases and improve security/simplicity would be a boon.

See https://github.com/AOMediaCodec/av1-avif/issues/149#issuecomment-844273064 and https://github.com/w3c/webcodecs/issues/169#issuecomment-843527006 for more specifics.

jonsneyers commented 3 years ago

Note that these signatures are all quite short and simple (all should be identifiable using an exact match of 2 to 12 bytes).

In probably 99% of cases that's true, but to recognize valid ISOBMFF-based images which have more than one compatible brand, it's not sufficient so assume that the major brand will be the one we expect. I wouldn't oppose restricting the mimesniff algorithm such that it only supported sniffing the major brand, but that's not the way it works now. Given that the current algorithm would fail to correctly sniff other exotic-but-valid ISOBMFF files, I think restricting it further to capture the common cases and improve security/simplicity would be a boon.

See AOMediaCodec/av1-avif#149 (comment) and w3c/webcodecs#169 (comment) for more specifics.

Ah you're right, in the case of AVIF you need to parse the whole ftyp box, so that could theoretically be an unbounded number of bytes. In the case of JPEG XL there is always an obligatory signature box that goes even before the ftyp box, so 12 bytes are always enough.

othermaciej commented 3 years ago

For security reasons mentioned in the related TAG issue, it's probably best to stop adding new formats to the sniffing algorithm, and it should probably explicitly forbid sniffing any image types besides the ones it does sniff. Whether it's safe from a compat perspective to stop sniffing the four formats listed in this issue, I am not sure. There's probably little enough content in each of these formats that browsers could stop sniffing them. If any already have enough usage that they need to be sniffed, then perhaps we should add them but then still forbid additional sniffing.

baumanj commented 3 years ago

I'm all for no-new-sniffs as it's both better for security and simpler to implement, but how do we address the reality that many of the web authors who may want to use a new format will not necessarily have the ability to modify what their servers' Content-Type header returns?

With plenty of existing, functional formats, it really stifles innovation to have this extra roadblock. I'd say being able to add content type attributes to things like <img> may help, but I'm guessing other folks have thought through the implications of that already and for some reason rejected them. Perhaps the answer is that folks in that situation need to use the <picture> element instead and explicitly specify the type attribute on the child <source> element. I'm not sure it that would be good enough coverage to not sniff new formats or not.

veluca93 commented 3 years ago

Completely ignorant perspective here, but wouldn't specifying a type attribute on the HTML side be even worse than mime sniffing, from a security perspective? If you can do that, you don't even need to "get lucky" with the first few bytes of the reply to have the data in your process...

(or perhaps specifying a type would imply that no cookies get sent?)

domenic commented 3 years ago

Yes; the server needs to be the one serving with the correct content type.

In practice we haven't found this to be an adoption problem for all other new formats (e.g. JS modules, various JSON manifests).

veluca93 commented 3 years ago

One thing that I suspect might be different between image formats and executable formats / data serialization formats (disclaimer: I haven't been working on the web side of things for a long time, so my knowledge may be incomplete/outdated/plainly wrong here) is that I'd expect that when you write an application that makes use of these things you are also writing a server-side application, which would imply that you have more control on what the server does; while the web dev that would like to use a new image format is more likely to belong to the set of people that are using some form of shared web hosting they have little control over.

I might be wrong though :)

jonsneyers commented 3 years ago

From a security perspective, isn't what really counts the number of bytes pulled into the process memory for sniffing purposes? I think the priority should be to reduce that to the lowest possible number. If N bytes are read anyway, I don't see how adding more cases for matching types really increases the attack surface. I can see how reducing N to a smaller number can help to decrease the attack surface, and how adding types that make N go up increases the attack surface.

If there are cases of false-positives (e.g. a sensitive spreadsheet incorrectly getting sniffed as image/avif), then that's a bigger problem anyway, because it is then possible that a server will also misidentify the resource and actually set the Response-type to image/avif. After all, in practice, servers can only rely on filename extensions (which might be wrong or missing) and... sniffing.

veluca93 commented 3 years ago

Well, the server side often can know that something is not an image/avif (i.e. something produced by a dynamic server that declares application/json) and then doesn't need to sniff. But AFAIU the problem on the client side is that clients will try to sniff just about anything, even if it is declared to be application/json... So false positives are a much bigger problem on the client side :)

jonsneyers commented 3 years ago

I think that according to the mimesniff spec, clients are supposed to only sniff for image codec signatures in the following scenarios:

So yes, it is a bit weird that if a server returns specifically application/json, if it's in an img tag, browsers will still sniff it. Maybe it would be better to not do that, and only sniff in one of the first 3 cases, also in an img tag. Or will that break things, i.e. are there servers that produce invalid (non image/*) Content-types for images and they need to keep working?

What is also weird to me is that the declared type in a picture srcset, as well as the type specified in a data uri are seemingly totally ignored (at least I couldn't seem to find them in the spec). Am I missing something?

domenic commented 3 years ago

Or will that break things, i.e. are there servers that produce invalid (non image/*) Content-types for images and they need to keep working?

Yes, that will break things. It's an ongoing discussion how much of the web we're willing to break; we're hoping to move to a model where we never sniff and only ever rely on the server-supplied content-type. So far the conclusion is to stop the bleeding by prohibiting new formats from using sniffing, and over time move the older formats to the same model.

What is also weird to me is that the declared type in a picture srcset, as well as the type specified in a data uri are seemingly totally ignored (at least I couldn't seem to find them in the spec). Am I missing something?

They are used to determine what requests to issue. The browser picks the first format which it knows about, and then fetches the corresponding URL from the server. This allows image format-based selection.

Once the fetch goes to the network though, those attributes are ignored, since they are not authoritative (i.e. they are under "attacker" control in malicious request scenarios). They only impact which request gets issued (which is something already under attacker control anyway), not how the request is processed.

jonsneyers commented 3 years ago

So you can have a <srcset type="image/avif">, and the server can return Content-type: image/jpeg where the actual data is a PNG image, and that is supposed to display just fine, but if the server returns actual data that is an AVIF but it doesn't specify an explicit Content-type, then the browser is supposed to discard the data and refuse to display it?

domenic commented 3 years ago

Correct. And in the future, the first scenario where the server returns Content-type: image/jpeg for a PNG image will discard the data as welltry to decode the image as a JPEG and fail, displaying a broken image. (Sorry, edited upon realizing the difference between the scenarios.)

jonsneyers commented 3 years ago

This means that new media types can only effectively be introduced once all servers return the correct Content-type for them. That is a significant additional adoption hurdle. In many cases, web authors do not have enough control over their hosting server to make it send correct Content-types for new formats. Making this a hard requirement means a potentially very significant deployment delay for such authors.

Wouldn't it make more sense to still allow image signature sniffing in an image context if there is no Content-type, but not if there is an actual Content-type that is not of type image?

What is in the current mimesniff spec is that if there is an <img src="bank.com/account_details.json"> on a page, and the server responds with Content-type: application/json, then the browser is obliged to sniff the data to check if it is by any chance a JPEG or a WebP (and display it if it is), but if there is a <picture><srcset type="image/avif" src="foo.avif"> and the server returns no Content-type in its response header, then the browser is still obliged to sniff the data to check if it is by any chance a JPEG, WebP, PNG, etc, and if it is, then it should show the image, but if it's not (and it is not, it is an AVIF), then bad luck, you get a broken image icon, even if you have a browser that decodes AVIF just fine.

This does not make sense to me.

saschanaz commented 3 years ago

Can we get some actual data to see how frustrating it is, by adding relevant counters (for when the browser gets AVIF but without a proper mime type)? I think speaking without an actual data will not go anywhere.

domenic commented 3 years ago

Indeed, in practice we haven't found it to be a problem for developers. Of the many pieces of the ecosystem that need updating for a new media format, server's MIME type databases are not a very hard one.

jonsneyers commented 3 years ago

Well, the image/jxl images served on jpegxl.info (hosted by github.io) do not get a Content-encoding, because github.io does not seem to have an up to date MIME database. But that's of course only one data point. Probably most other servers will have up to date MIME databases.

veluca93 commented 3 years ago

My concern is mostly with people that host their websites on shared hosting providers etc.

It's clear that companies and people that have their own web servers wouldn't have big issues with providing the correct content type.

However, shared hosting sites are often rather slow in doing feature updates. For example, I created a website for testing purposes on one of Italy's most known shared hosting providers (altervista.org), and I confirmed that it does not send the correct MIME type for .avif files, despite AVIF being enabled by default in Chrome for quite a while.

On the server side, there is today (in practice) no need to do anything to support new image formats. Not extending the mime sniffing protocol (or enforcing strict conformance to the currently-defined mime sniffing protocol) to new formats would, in my opinion, introduce a significant hurdle in their adoption by people that use shared hosting options.

baumanj commented 3 years ago

Since type declarations in HTML (in a <picture> element context, for example) are under control of the malicious entity and must be ignored, is there basically no way to enable the web author who doesn't have control over their server configuration for Content-Type (github.io pages are a great example) to be early adopters of new formats?

It seems like a choice between innovation (sniffing) and security (not sniffing). Or is there some alternative route that I'm not seeing?

jonsneyers commented 3 years ago

At least same-origin HTML type declarations could in principle be trusted instead of just ignored, no?

For security, I think more important than not adding new sniff patterns, is reducing the number of cases in which sniffing actually happens. In my opinion, for same-origin images, if a media type is available from html, no sniffing is needed, and for cross-origin images, if the server response type is an explicit "not unknown" non-image media type (like application/json), then also no sniffing should be done to check if it by any chance is an image anyway. Restricting the sniffing to happen only in the "unknown" cases (either no response type at all, or a 'default' response type that popular server software uses when it has no clue) would help a lot to tighten things.

Also, making cross-origin image requests credentialless by default would also help a lot. These things would have much more positive impact on security than not touching mimesniff spec at all to not make things worse than they already are, and hoping that eventually there are only 'new' codecs left that will not be sniffed.

Allowing (new) sniffing only on unknown response types would be enough to avoid innovation hurdles while keeping security implications limited.

foolip commented 9 months ago

I found my way here after noticing that Web Almanac detected image types from either the file extension within the URL or the Content-Type header, but not sniffing like browsers do. (https://github.com/HTTPArchive/almanac.httparchive.org/issues/3572)

From some comments it sounds like browsers don't sniff for AVIF, so I tested it by renaming an .avif file to .jpg and serving using python3 -m http.server 8000 --bind 127.0.0.1.

An AVIF image served with image/jpeg loads just fine in Chrome, Firefox and Safari.

I've also confirmed that Safari will load a JPEG XL file served as application/octet-stream or image/jpeg.

So it seems like all browsers are continuing to sniff for image formats, and ideally mimesniff should document the interoperable rules for that sniffing.

foolip commented 9 months ago

Here's the AVIF sniffing as implemented in Chromium:

https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/platform/image-decoders/avif/avif_image_decoder.cc;l=567-580;drc=35406c8d4b7301ede262aeedfc6a63e5e3cf555d

Looks like it can read up to 144 bytes, and uses libavif for parsing:

https://source.chromium.org/chromium/chromium/src/+/main:third_party/libavif/src/src/read.c;l=3901-3917;drc=35406c8d4b7301ede262aeedfc6a63e5e3cf555d;bpv=0;bpt=1

I guess this would have to be written down in a shape similar to https://mimesniff.spec.whatwg.org/#signature-for-mp4, but possibly more complex.