Open dveditz opened 7 years ago
SRI for explicit downloads seems like low-hanging fruit. You're thinking something like <a href='' download integrity="...">
?
I vaguely recall @bzbarsky having concerns about content-encoding: gzip
, but I think @devd worked them out. Otherwise, the infrastructure should be there.
(We just need someone to sign up to do the work... Y'all volunteering? :) )
It seems the main concern that hold this back in the past (proposed as https://wiki.whatwg.org/wiki/Link_Hashes and also various alternatives, see https://lists.w3.org/Archives/Public/public-whatwg-archive/2012Oct/0188.html; one of which was once added to the standard: a https+aes scheme) was lack of implementer interest and the worry that the integrity would get out-of-sync with the download and the user would just use some other tool to get the resource.
Note also that unless we carve out an exception (let's not?) this will require CORS, which is new for downloads. So you end up with <a crossorigin download=... integrity=...>
and you'd have to define both crossorigin
and integrity
for <a>
.
Sounds easy and interesting. I can try to write it up, if nobody more qualified signs up for this.
Note also that unless we carve out an exception (let's not?) this will require CORS
That's a good point.
We require CORS for subresource fetches because we'd otherwise be exposing the content of the resource via the hashes. Does the same apply to downloads? As far as I know, <a download>
is fire-and-forget in Chrome; we don't expose a success/failure event or give the site access to its downloaded resources. Is the data exposed via one of the performance/timing APIs?
We've had requests already, e.g., in https://github.com/whatwg/html/issues/954. I don't think we should try to postpone the need for safety as that will just make it very brittle.
Got it. In that case, I completely agree that the CORS requirement is something we should keep in place.
Looks like this issue has fallen by the wayside?
Content integrity for downloads has resurfaced in the news, including cases where an HTTPS page links to a plain-HTTP download. While those cases should be fixed, including download integrity feels like a low-hanging fruit to my uninformed point of view.
Given that the download
attribute works in terms of navigation at the moment this actually seems even harder. Perhaps there is some way to decouple it from navigation, but that would be quite a major change to implementations.
I create https://github.com/w3c/webappsec-subresource-integrity/pull/78 to try push forward the discussion as this feature could really improve the security of the global ecosystem.
Unfortunately, I don't think that helps as it doesn't address the issues.
Is there something one (with limited HTML and HTTP knowledge) can do to help with the process of this issue?
Popular software such as GIMP or LibreOffice use mirrors and I would expect that the average computer user does not know how to verify the integrity or that this is important.
Regarding the linked whatwg mail archive thread it would be necessary to clarify what the intention of this issue is:
download-integrity
to clarify that it has no effect unless used for downloads (would still require download
attribute)Supporting a length
value describing the size of the downloaded content in bytes would allow failing fast, even while downloading if the content is larger than the specified length.
The proposed format should also support specifying multiple checksum algorithms in case the user agent does not support all, which will especially become the case in the future when new checksum algorithms emerge.
Therefore the following would in my opinion be a good format:
<a href="..." download download-integrity="INTEGRITY_DATA">
With INTEGRITY_DATA having this format (pseudo grammar):
INTEGRITY_DATA: (CHECKSUM,)+ length:[1-9][0-9]* CHECKSUM: ALG_NAME : CHECKSUM_VALUE ALG_NAME: [a-zA-Z0-9-_]+ CHECKSUM_VALUE: Base64
Algorithm names should be clearly defined (either here or somewhere else) and should be matched case-sensitively to prevent something like "SHA-1", "shA-1", "sHa-1" and because in some programming languages comparing case insensitively can easily go wrong when the system language is used and it has special lowercasing rules (e.g. Turkish).
The checksum bytes are Base64 encoded because it can even in hex notation be quite large, e.g. for SHA-512 it is 128 chars in hex while only being 88 chars in Base64. Base64 padding (trailling =
) is required and must not be omitted.
Example:
<a href="example.com/download" download download-integrity="length:1245667025,md5:1B2M2Y8AsgTpgAmY7PhCfg==,sha256:47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=">
If length
is present, then the user agent must use it to verify the integrity.
If multiple checksums are present it may pick any, it is advised to pick the strongest one.
If no checksum algorithm is supported it may show a warning to the user, or it may just ignore the checksum information. It may also display the algorithms and checksum values to the user so they have a chance to verify the integrity manually.
Note: It might make sense to add a warn-if-none-supported:true/false
value to the download-integrity
attribute. The default value is false
. If true
the user agent must warn the user. The usecase would be mirror sites where failing to verify the integrity could have security implications.
If the integrity was successfully verified, the user agent is encouraged to indicate this to the user. However, it should be displayed as informational text (so the user knows they do not have to verify the integrity manually), but must not create a false impression of security, e.g. that the file is not a virus (similar to the previously green lock icon in the URL bar for HTTPS sites).
If the integrity check fails, the user must be informed that the file may be corrupted, modified by an attacker or that the site is incorrectly configured. The user agent is encouraged to advise the user to contact the site administrator. The user agent must offer the user two options: Deleting the file (preferred), and keeping the file. Unlike described in the whatwp wiki it should not use the term "Quarantine" since that would for most (if not all) OS' be just another folder. User agents are encouraged to only place the downloaded content in the "Downloads" folder of the OS as soon as the user accepted to keep the file. Otherwise the user might first see the file in the "Downloads" folder and open it before noticing the warning by the user agent.
Hopefully this comment is useful and not too intrusive. I tried to write down my thoughts as precise as possible. Any feedback is welcome :)
@annevk What are the blocking point on that issues? What points need to be discussed to make it move forward?
It is an important security issue for all websites using mirrors/CDNs for downloads.
There is no workaround for it (VLC tried to use js to download the file in memory and do the checksum but it has a lot of drawbacks: the browser compatibility is terrible, it require CDNs to add CORS headers and it doesn't work well with large files).
Given that the
download
attribute works in terms of navigation at the moment this actually seems even harder. Perhaps there is some way to decouple it from navigation, but that would be quite a major change to implementations.
@annevk, Wasn't download respecified as based on fetch?
We wrote an article (https://serval.unil.ch/resource/serval:BIB_9BD511E5C0D0.P001/REF) on checksum verification recently and suggested extending SRI to handle downloads. We wrote an explainer: https://github.com/checksum-lab/checksum-lab.github.io/blob/master/README.markdown One issue with the download attribute for elements (mentioned above) is that it is restricted to same-origin links, which is the case that makes the least sense for checksums (https://www.w3schools.com/tags/att_a_download.asp).
I can answer parts of my own question to annevk from above. Downloading a hyperlink is specified in HTML.
@khuguenin same-origin or cors-same-origin, no? It would suffice if the CDN/Mirror sent a header of `access-control-allow-origin: *, which many CDNs do and already have to do for SRI with scripts/styles.
@mozfreddyb I think requiring CORS would reduce the usage of checksum because all mirrors/CDNs do not support it. If the download is "fire and forget" and the original page have no way to know if the download is complete, valid, or not, then I do no see a reason to require CORS. (also, if the mirrors/CDNs do have CORS, the javascript could do the checksum itself already today)
How do we ensure the download is (and remains) unobservable? I see there's the request's initiator
set to download in the spec, but I'm not entirely sure that it can not be forged. I'd like to hear an expert's opinion here (@annevk, probably :))
I feel like we can start with spec'ing with CORS; that's gonna be hard enough. Let's not increase difficulty level to max.
On Tue, Mar 17, 2020, 3:27 AM Frederik Braun notifications@github.com wrote:
How do we ensure the download is (and remains) unobservable? I see there's the request's initiator set to download in the spec, but I'm not entirely sure that it can not be forged. I'd like to hear an expert's opinion here ( @annevk https://github.com/annevk, probably :))
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/webappsec-subresource-integrity/issues/68#issuecomment-599994076, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABBOGMCTINQE3IMWISVTSLRH5GC3ANCNFSM4DFERDIA .
What HTML says about downloads isn't entirely in line with implementations. Basically, navigation can result in a download (Content-Disposition
) so it's all handled there. The download attribute is an additional input to the navigation algorithm to force downloads. I don't remember the crucial differences unfortunately, but any change here would be rather involved I'm afraid.
This feature should not be postponed or redefined for things other than specifying the uncorrupted hash of download.
Accordingly, this reduces to the following simple changes to the SRI specification:
Note that nothing in the SRI specification and concept depend if the user agent uses the "fetch" specification or not.
As a logical consequence, the following would all apply:
Specifying integrity for an ordinary page link, shall cause the loading of the linked page to fail with an appropriate error (not warning) if the page doesn't match. CORS does not (by default) apply to these links. This is useful for having a trusted document delivered in an off-web secure way (such as S/MIME e-mail) to refer to stable documents online. This link hashing can be chained to unlimited depth as long as the author avoids dependency loops (a.html specifies the hash of b.html which specifies the hash of c.html which specifies the hash of a.html).
Specifying integrity for a download link (with or without download attribute) shall cause the download to fail with an appropriate error (not warning), if the file doesn't match. This is useful for any download provided via a CDN or other 3rd party server. CORS does not (by default) apply to these links.
Specifying integrity for an image, sound, video, applet, script or font that doesn't match shall result in a failed subresource download (broken image symbol etc.). CORS does (by default) apply to these .
Alternative URIs in IMG tags etc. are not subject to the generic integrity attribute (it wouldn't match), but new attributes could be introduced to specify their hash values. For many of these, CORS does (by default) apply, but conceivably, new extensions to HTML could introduce alternative URIs for things to which CORS does not (by default) apply.
Alternative documents available via HTTP or other content negotiation mechanism will need their own enhancement of the SRI specification, perhaps by providing the hash of a list/tree of resource hashes where that list/tree is provided in the negotiation server response. However the basic specification for URIs that return a stable byte stream should not wait for such enhancements.
When we were first discussing sub-resource integrity verifying downloads was one of the original desires. It got booted from the "MVP" early on (I can't remember why) and didn't get carried over from the old issues space to this one. Now it's time to take it up again.
If part of the concern was about navigations vs downloads and/or wanting to know whether we had to check integrity before we started the download we could restrict it to links that also have the HTML
download
attribute.