Open annevk opened 4 years ago
An alternative to consider is to act as if images generated from opaque responses never have EXIF data (or any kind of metadata other than width/height/ratio). Requiring each feature that can disable or override some EXIF data to take such images into account feels very brittle to me.
An alternative to consider is to act as if images generated from opaque responses never have EXIF data. Requiring each feature that can disable EXIF data to take such images into account feels very brittle to me.
I think that that would create a requirement that would make people enable CORS when they didn't otherwise have to - just to have their images display correctly.
I think there's no reason why this requirement should be on the usage of EXIF, instead on the feature that overrides (and thus exposes) EXIF
, such as image-orientation
and image-resolution
CSS properties.
Otherwise, it feels like we're trying to prevent a threat of a hypothetical future API. Is that a necessary thing to do?
They're not hypothetical, this API (image-orientation
) already exists, https://github.com/whatwg/html/pull/5603 adds another. Making the APIs enforce the policy violates the principle of least privilege and will likely lead to numerous bugs.
cc @stephenmcgruer
If we do have to prevent image-orientation
from working on images that came from opaque responses, it would be nice if we could unconditionally apply the orientation (and I guess pretend from the whatwg/html#5603 APIs that there was no orientation metadata), so that we can try to treat orientation as an implementation detail of the image file representation. But that would make it tricky for authors wanting to use image-orientation: none
to turn off the new re-orientation effects for their pages that rely on it not being applied.
If we do have to prevent
image-orientation
from working on images that came from opaque responses, it would be nice if we could unconditionally apply the orientation (and I guess pretend from the whatwg/html#5603 APIs that there was no orientation metadata), so that we can try to treat orientation as an implementation detail of the image file representation. But that would make it tricky for authors wanting to useimage-orientation: none
to turn off the new re-orientation effects for their pages that rely on it not being applied.
I think that's a better approach... the threat comes from the "overriding" feature, not from the implementation detail of using EXIF. An image format may similarly have an internal representation of orientation/resolution supported internally in the decoder - would that also be limited to same-origin/CORS images?
EXIF is not the issue here - it's the mixing of image-originated data and markup-originated data, which is something that currently occurs only for naturalWidth/naturalHeight. If we want to take a more generic approach - I think it should tackle those blurred boundaries between content and markup.
@noamr again, it's not just overriding, it's also reading as linked above. There's various different ways this will end up being exposed.
@heycam how do we model it in such a way that we don't need security checks all over?
I guess what we could do is that we take the orientation into account for decoding purposes, but don't store it as a field on the resulting image if it was generated from an opaque response. So it appears rotated, but if you query its metadata it'll return the default orientation values.
The tricky aspect is when metadata can be overridden, as it can be here. If the internal representation still has non-default metadata you would need to take that into account somehow. I.e., if an image was rotated 90 degrees and an API asked for it not to be rotated, it would have to remain rotated at 90 degrees. Model-wise that follows from the preceding paragraph, but in implementations that might be a bit trickier.
@noamr again, it's not just overriding, it's also reading as linked above. There's various different ways this will end up being exposed.
Sure, I meant reading/overriding - anything that enables reading directly or indirectly.
The tricky aspect is when metadata can be overridden, as it can be here. If the internal representation still has non-default metadata you would need to take that into account somehow. I.e., if an image was rotated 90 degrees and an API asked for it not to be rotated, it would have to remain rotated at 90 degrees. Model-wise that follows from the preceding paragraph, but in implementations that might be a bit trickier.
I think that the overriding features in this case should be disabled for the opaque resource. E.g. CSS image-orientation would simply not apply, maybe even regarded as invalid style. I think that would be reasonable implementation-wise.
I think the model we end up with shouldn't require each new feature to check whether the image was generated from an opaque response. So if some theoretical feature allowed setting EXIF rotation to 90 and the opaque image already had it 90 (exposed as 0 per the above model), it should be as if it was 180 and be exposed to the world as 90.
I think the model we end up with shouldn't require each new feature to check whether the image was generated from an opaque response. So if some theoretical feature allowed setting EXIF rotation to 90 and the opaque image already had it 90 (exposed as 0 per the above model), it should be as if it was 180 and be exposed to the world as 90.
I like that. Makes metadata "embedded" into the image for opaque images, but still working as expected if there's nothing that tries to override/read it.
I guess what we could do is that we take the orientation into account for decoding purposes, but don't store it as a field on the resulting image if it was generated from an opaque response. So it appears rotated, but if you query its metadata it'll return the default orientation values.
I think this would be the right approach. The intrinsic orientation spec concept for opaque images would be "zero degrees, no flip", and the image dimensions (whether the image is opaque or not) would have the orientation taken into account. image-orientation
's behavior would then be written in terms of the image's intrinsic orientation.
cc @chrishtr @smfr
I don't see a lot of movement on this ticket... does any implementer have an opinion about this? It's currently blocking https://github.com/whatwg/html/pull/5574, and these same-origin policy violations are already in the wild... would be good to figure out if we see EXIF orientation/(resolution) data as a cross-origin information leak, and if it is, how to mitigate it.
I didn't see it stated very clearly clearly in this issue, so let me first state what I think the information leak is:
Developers can detect whether there is EXIF rotation information in an image by rendering it twice - once with image-orientation: from-image
and one with image-orientation: none
, and observing if there is a difference in the layout size of the result.
Therefore, for a cross-domain image, the developer can obtain one bit of information about these images.
However, don't sites already know multiple "bits of information" about cross-origin images, such as their width and height?
I didn't see it stated very clearly clearly in this issue, so let me first state what I think the information leak is:
Developers can detect whether there is EXIF rotation information in an image by rendering it twice - once with
image-orientation: from-image
and one withimage-orientation: none
, and observing if there is a difference in the layout size of the result.Therefore, for a cross-domain image, the developer can obtain one bit of information about these images.
Yes, and same for a potential implementation of image-resolution, and for querying image orientation from javascript (https://github.com/whatwg/html/issues/5602).
However, don't sites already know multiple "bits of information" about cross-origin images, such as their width and height?
I think the only bits of information they know right now is an image's width and height. Is exposing related information such as orientation/density a problem? It's hard for me to fathom how that info can be used, but it's difficult to be certain.
I think it is a problem. There's a long history of all these communication channels leading to privacy and security issues. We should hold the line where we can and clearly we can here.
One concrete scenario that can be problematic:
PhotoSharing.example allows non-CORS cross-origin fetching of credentialed images, but only for logged-in users or users that belong to a certain group (which the image was shared with).
PhotoSharing.example already knows about the width
and height
leak, as well as timing attacks that may result from it not serving the image in the disallowed cases. As a result, it creates an empty image with the same dimensions and makes sure that the response timing looks similar to the real deal (without setting Timing-Allow-Origin on neither image).
But, if the original image contains orientation or resolution information, adding those capabilities would surprise PhotoSharing folks and cause them to potentially expose log-in state or group affiliation across origins.
It seems like this is a problem that will go away when browsers limit cross-origin credentials, but we're not there yet.
Would it make sense to only respect orientation/resolution for CORP enabled images? CORP seems like a clear signal saying the image can be embedded. I wonder what would be the impact of that on deployability.
/cc @eeeps @mikewest @arturjanc
I don't want to drag a dependency on CORP into this. That would change its semantics from a boolean check in fetch to a property of the response. All CORP: cross-origin means is that you're okay being side-channel attacked. This is not a side channel.
I found a scenario in the related issue https://github.com/whatwg/html/pull/5574 where some indirect means can be used to figure out the image's resolution. See this comment. I am convinced that this needs to be addressed.
Recapping the two current proposals (following IRC discussion with @annevk):
In either case, a cross-origin image might appear different depending on which origin is embedding it. In (1), it will appear different by default. In (2), it will appear different only in certain cases. e.g. when CSS image-rotation
, image-resolution
or srcset
is being used, or in future scenarios that we are not yet aware of.
Also both (1) and (2) would require changes in current implementations, as image-orientation: none
is already shipped.
I believe that (1) is easier to implement and grasp, however, it would have a higher chance of breaking some current sites using EXIF-rotated images (if the images are cross-origin and don't have the CORS headers).
IMO, a CORS restriction would be too restrictive, and will pose a significant deployment hurdle. @annevk - can you expand on why this wouldn't work well with CORP? It seems to be leaking the exact same information that CORP "allows" you to leak.
/cc @camillelamy
CORP is about allowing a Spectre-read gadget to potentially get at your data. It's not an assertion that it's fine to make that data public and it's not guaranteed that Spectre-read gadgets will be able to get at it forever. Otherwise we might as well have required CORS there as the initial design did. Also, all data, not just the metadata.
I found a scenario in the related issue whatwg/html#5574 where some indirect means can be used to figure out the image's resolution. See this comment. I am convinced that this needs to be addressed.
Recapping the two current proposals (following IRC discussion with @annevk):
- Ignore metadata for opaque-response images
- Bake the metadata in for opaque-response images (e.g. rotate and scale the image but ignore that notion when applying CSS rotation/srcset scaling).
In either case, a cross-origin image might appear different depending on which origin is embedding it. In (1), it will appear different by default. In (2), it will appear different only in certain cases. e.g. when CSS
image-rotation
,image-resolution
orsrcset
is being used, or in future scenarios that we are not yet aware of.Also both (1) and (2) would require changes in current implementations, as
image-orientation: none
is already shipped.I believe that (1) is easier to implement and grasp, however, it would have a higher chance of breaking some current sites using EXIF-rotated images (if the images are cross-origin and don't have the CORS headers).
Blocking metadata with CORS completely could cause an issue I haven't thought of earlier - it means that CSS-loaded images can't use orientation/resolution, as those don't expose a crossorigin
attribute (which is currently only meaningful for canvas drawing). OTOH CSS-loaded images don't leak any of the metadata information as the image's size is not readable and doesn't affect layout.
In addition, it would require regular images to start including crossorigin
when using a CDN, just to have their image displayed correctly. That doesn't seem reasonable.
As today so many images are CDN-delivered and don't bother with a crossorigin
attribute (or can't because the image is CSS-loaded), I think it's a blocker for using (1) - it would make image orientation and resolution less than usable.
CORP
seems less suitable as it's meant to block embedding at all, not just reading of metadata.
I believe that this should be blocked with an additional HTTP header (yikes), similar to Timing-Allow-Origin
, or not at all - servers who want to offer this kind of protection to their images can bake the metadata into that image and not expose it.
TL;DR: proposing an HTTP header (maybe Media-Transform-Allow-Origin
), similar to Timing-Allow-Origin
. If that header is not present, image orientation/resolution from EXIF should be ignored.
I am wondering if there would be value in having a combined header like Metadata-Allow-Origin
where we can specify which kind of metadata are allowed for which origins (eg, timing, image orientation/resolution). This way, when a similar issue comes up next we can extend this header instead of defining a new one.
I am wondering if there would be value in having a combined header like
Metadata-Allow-Origin
where we can specify which kind of metadata are allowed for which origins (eg, timing, image orientation/resolution). This way, when a similar issue comes up next we can extend this header instead of defining a new one.
Sounds like an interesting alternative, kind of like Access-Control-Allow-Headers
.
Maybe something like this would have sufficient granularity:
Metadata-Allow-Origin: *; Metadata-Allow-Properties: Orientation,Resolution,Timing
OTOH CSS-loaded images don't leak any of the metadata information as the image's size is not readable and doesn't affect layout.
They do, fwiw - ::before { content: url(...); }
creates an anonymous replaced box containing the specified image, which will affect layout (or makes the pseudo-element itself into a replaced element containing the image, to the same effect).
In either case, a cross-origin image might appear different depending on which origin is embedding it. In (1), it will appear different by default. In (2), it will appear different only in certain cases. e.g. when CSS image-rotation, image-resolution or srcset is being used, or in future scenarios that we are not yet aware of.
Just because it'll still allow images to look correct by default, I lean strongly toward (2). Each potentially-exposed bit of metadata just needs to define a "default" value that it'll masquerade as for the purpose of in-page manipulations. This is trivial for orientation, but I guess resolution will have to pretend to be 1x? That'll break srcset (it'll density-correct images twice), but that might be unavoidable here.
Just as a heads up, I started a thread on WebAppSec with a more general discussion about exposing resource metadata: https://lists.w3.org/Archives/Public/public-webappsec/2020Jul/0005.html
OTOH CSS-loaded images don't leak any of the metadata information as the image's size is not readable and doesn't affect layout.
They do, fwiw -
::before { content: url(...); }
creates an anonymous replaced box containing the specified image, which will affect layout (or makes the pseudo-element itself into a replaced element containing the image, to the same effect).
Good point, thanks.
In either case, a cross-origin image might appear different depending on which origin is embedding it. In (1), it will appear different by default. In (2), it will appear different only in certain cases. e.g. when CSS image-rotation, image-resolution or srcset is being used, or in future scenarios that we are not yet aware of.
Just because it'll still allow images to look correct by default, I lean strongly toward (2). Each potentially-exposed bit of metadata just needs to define a "default" value that it'll masquerade as for the purpose of in-page manipulations. This is trivial for orientation, but I guess resolution will have to pretend to be 1x? That'll break srcset (it'll density-correct images twice), but that might be unavoidable here.
Resolution would have to pretend that the image has the density requested by the srcset, or 1dppx
if it's not in srcset.
I think this is totally doable, given a mechanism allowing servers to allow it as opt-in (be it a new header or CORP).
I think there's no reason why this requirement should be on the usage of EXIF, instead on the feature that overrides (and thus exposes)
EXIF
, such asimage-orientation
andimage-resolution
CSS properties.
I think I agree with this.
In other words, I think image-orientation
is a transition mechanism to fix the historic bug that we haven't honored EXIF orientation data as a part of the image format, which it should really be considered to be a part of. We'd like to end up in a world where browsers (like other applications) consider the EXIF orientation as part of the image format.
The problem that leads to an extra bit of information (on top of the pretty substantial amount of information from the width and height) is the ability to choose between honoring and not honoring the EXIF orientation.
One solution suggested for that problem has been to ignore the EXIF orientation for cross-origin images without appropriate CORS headers. But an alternative that would still avoid exposing the bit of information, is to make the transition for cross-origin images happen faster (maybe instantaneously) than the transition for same-origin images. That is, the thing that exposes the additional bit of information isn't honoring the EXIF orientation, it's exposing a toggle to enable/disable honoring the EXIF orientation. And I'd rather end up honoring it all the time than never. In other words, I'd rather end up in a situation where we can just treat the EXIF orientation data as part of the image format, for all images.
To be clear, I agree that there are multiple solutions and that 2 in https://github.com/w3c/csswg-drafts/issues/5165#issuecomment-654127723 is more attractive, but as pointed out it's not just image-orientation
that's problematic and with 2, implementations will have to be a lot more careful in terms of robustness. Either way we'll need a lot of tests.
As I understand it there is some agreement around baking the EXIF orientation (and other EXIF derived properties) into opaque images at decode and then pretending through the rest of the rendering pipeline that there is no EXIF data on opaque images (option 2 from https://github.com/w3c/csswg-drafts/issues/5165#issuecomment-654127723). In practice right now, that means opaque images with EXIF data would always behave like "image-orientation: from-image" regardless of any CSS style that might apply. Plus any style query would always respect orientation.
I support this because it is chromium's long term intent to remove the "image-orientation" property and moving closer to that goal while improving security seems like a win.
The CSS Working Group just discussed [css-images] image-orientation:none violates same-origin policy
, and agreed to the following:
RESOLVED: Do not expose orientation data for cross-origin images
Reading the context around that resolution I think the CSS WG decided on 2 from https://github.com/w3c/csswg-drafts/issues/5165#issuecomment-654127723 but it would be great if @tabatkins or @cbiesinger could confirm.
Yes, exactly that.
Our use-case is very similar to the one mentioned in this bit of the conversation
cbiesinger: I'm in favor of model TabAtkins desc. Had one person contact me where he would like it to comeintue to apply to cross origin b/c they have tool to present image and get user to annotate and then they hand over annotation to another tool. Without being able to tell the tool the orientation they can't tell if they have to process. s/ cbiesinger /heycam heycam: THey can work around that TabAtkins: Or preprocess to turn on cors stuff they'll be fine
Many of our users won't turn on CORS because of truly sensitive data like GPS coordinates which can be present in EXIF data. Because of that preprocessing said images to enable CORS is either impossible or unethical.
For context here's the order of events from our perspective:
I understand that the 1st fix from #5165 (comment) will initially affect more people negatively, but it's also the only option that maintains a path for backwards compatibility with the default behavior which was changed very recently.
To me, this use case outlines the fact that exposing this bit may require a separate opt-in from CORS. I suggested elsewhere that we may consider CORP for this.
I'm unfamiliar with the process of these decisions becoming standards and getting implemented, are there typical timelines for implementation of decisions made here? How long do these decisions typically take to go from decision to implementation? Since the suggested solution will cause silent breaking changes for us, and possibly a lot of other people with image manipulation or annotation use-cases, we'll need to start preparing for the break as soon as possible.
@philcunliffe - spec decisions are not directly tied to implementations in obvious ways, but implementations are likely to follow. At the same time, the case you're raising makes me doubt the webcompat feasibility of this change.
IMO, it's worthwhile to further discuss some form of opt-in unification that may enable us a more compatible path forward.
@yoavweiss thanks for the clarification
There are implicitly 2 types of image metadata currently, stuff we are OK leaking like width and height, and things we aren't OK leaking which encompasses everything else. I'd contest that, as integral display data, orientation fits in the first category with width and height.
I understand the objections and that a lot of people would like to eliminate the width and height leak or even disallow cross-origin images all together but, if they're going to continue to be allowed then orientation value is essential to properly using them.
I tend to see the point that orientation/resolution are difficult to obfuscate from the embedder, as they are essentially "embedding instructions".
Perhaps the term "metadata" is not the right semantic. Instead, we can say that a resource exposes several types of data:
Seems to me that when we try to mix the first type with the second type, we get to requirements that are difficult/impossible to implement: The embedder still needs to know the size/orientation because correctly displaying other content depends on it, but we try to hide that information to avoid cross-origin information leakage.
We can go for interesting but complex solutions like making image-orientation
inert in some cases, but as commented, it creates a web compatibility headache.
From what I hear, as long as image orientation is respected, and image-orientation
css is used for backwards compatibility, IMO there is no solution but to close this issue and let that leakage be. (intrinsic image resolution is a different story, as it doesn't have significant backwards compatibility implications).
From what I hear, as long as image orientation is respected, and image-orientation css is used for backwards compatibility, IMO there is no solution but to close this issue and let that leakage be. (intrinsic image resolution is a different story, as it doesn't have significant backwards compatibility implications).
This would work for us, but in an ideal world we would like to respect EXIF orientation as well. In our use-case respecting EXIF orientation requires knowing the orientation value itself. This was the original impetus behind this pull-request from @heycam https://github.com/whatwg/html/pull/5603 which would put orientation into parity with width and height.
Not sure where this is standing now...
@tabatkins @cbiesinger, seems like this was resolved in the CSS work group. Do you care yo address the web compat concerns from @philcunliffe? if not, what is the next step - does this need to go into the image-orientation
CSS spec? Happy to help with a PR if that's what's needed.
WebKit bug: https://bugs.webkit.org/show_bug.cgi?id=217294
Note to self: blocked on edits for the moment because Fetch doesn't seem to have any terms I can use to refer to an image that's "CORS-clean" or not.
Note to self: blocked on edits for the moment because Fetch doesn't seem to have any terms I can use to refer to an image that's "CORS-clean" or not.
I believe that image data can be CORS-cross-origin: From the spec: "Otherwise, response's unsafe response is image request's image data. It can be either CORS-same-origin or CORS-cross-origin"
@noamr if I understand these bug reports there doesn't seem to be any pathway to backwards compatibility in terms of prevention of rotation and no way for the client to read the orientation value. Is that correct?
@noamr if I understand these bug reports there doesn't seem to be any pathway to backwards compatibility in terms of prevention of rotation and no way for the client to read the orientation value. Is that correct?
For cross-origin images that is correct. The bug reports and tests follow what the CSS-wg has decided. If the decision changes the tests and implementation would be different of course.
Landed in webkit: https://trac.webkit.org/changeset/268249/webkit
Landed in webkit: https://trac.webkit.org/changeset/268249/webkit
When this does make it into the large browsers I would suggest making this a very loud change in patch notes, it's again changing existing behavior that many applications depend on.
As I realized in https://github.com/whatwg/html/pull/5603, this leaks an additional bit of information for opaque responses.
cc @heycam