Open patrickkettner opened 1 year ago
Regardless of currently existing implementations, I don't think that it would be a good design to include this information in DownloadItem by default. It being there implies that the information should be present at all times, even for the majority of the extensions that don't use it. This increases the required complexity for browsers that implement the downloads API.
It would be more acceptable to provide the functionality through a dedicated method, e.g. a new downloads.getFileHash
:
downloads.getFileHash
permission, even if there is no permission message.Beyond the API design aspect, there is the privacy/security consideration: this functionality is essentially an oracle that emits the sha256 hash for any web content. This hash can easily be reversed to the original value when the number of possible options is relatively small. A way to counter this is to add a permission warning for the extension permission.
https://github.com/w3c/webextensions/labels/follow-up%3A%20chrome label here means: Patrick to follow up with the engineers to answer the (design) questions/considerations raised above and in the meeting.
I agree that putting this API behind a permission makes sense, and agree that most extensions, including many download-monitoring extensions will not need it.
From a privacy/security POV: Yes, this does reveal information about the response content, and because a malicious extension could cause any URL to be treated as a "Download", it does seem like we should probably put this behind a permission prompt at install time.
Making this a function rather than a property seems entirely reasonable for performance and ergonomics reasons.
Returning an error if the file has been moved or deleted seems entirely reasonable.
Behavior for cases where the file has been modified is an interesting question. To address the current use cases, the file's hash at the point of download completion is the only interesting value, so I would definitely want to deem returning that hash value, at any point in the future, regardless of subsequent modification, as "Valid behavior."
However, if there's some other client which does NOT calculate the hash in the course of its normal operation, it probably does make sense to allow the API to hash the current content of the file on disk and return that value, which would avoid introducing (potentially unnecessary) hash computations on every download completion, and avoid the complexity of checking to see if the file was modified, etc.
(An alternative approach might be to say that the getFileHash
method is only valid to call in the onChanged
callback when the download enters the complete
state, but that would probably be non-trivial to implement and somewhat challenging to explain to developers)
@ericlaw1979 I spoke to engineering on our side, and we're open whats been discussed, but want a more formal spec. Is that something you are planning on working on?
I wrote a patch for Chromium (no CL yet) just to see what the API could look like and how to use it. I came up with the following:
IDL (simplified):
namespace downloads {
callback GetFileHashCallback = void(ArrayBuffer hash);
[supportsPromises, permissions="downloads.getFileHash"] static void getFileHash(
long downloadId,
[legalValues=("SHA-256")] DOMString algorithm,
GetFileHashCallback callback);
}
Specific API choices to consider:
downloads.getFileHash
downloads.getFileHash()
which takes two required arguments, downloadId
and constant string SHA-256
, and an optional callback.SubtleCrypto
, this argument is normalized. (E.g, it can be spelled as sha-256
or Sha-256
, etc.)SubtleCrypto
, this method returns ArrayBuffer
(via callback or promise)downloads.getFileHash
or downloads
permissiondownloadId
is not valid or such download does not exist*the hash will need to be persisted to disk, which will grow serialization format a bit
Overall, the API addition seems small and I would gladly create the CLs for it. (Only two are needed: one for the API part, one for persisting hash to disk across sessions.)
@oliverdunk @patrickkettner does IDL in comment above make sense? If so, I can create a Chromium CL for this functionality this week.
In response to https://github.com/w3c/webextensions/issues/401#issuecomment-1875760403 : I would suggest the following changes:
"SHA-256"
. Not "sha-256"
, "Sha-256"
, etc. More than one value is not strictly necessary, but would only increase the implementation complexity by having to normalize.ArrayBuffer
of the digest. The two can trivially be converted to each other, but I expect the string value to be more ergonomic for extension developers.Hash is calculated only when the download was completed
I'm concerned about this part of the proposal. It reintroduces the top level concern from https://github.com/w3c/webextensions/issues/401#issuecomment-1580782800 .
and not updated if user updates the file (because that would be a privacy violation) and maintained as long as download exists in history (even if user deletes the file)*
I wouldn't mind the state to be saved when available (and then consistently be available), but I not guarantee its availability in the API when the file is absent. If a hash algo other than SHA-256 were to be introduced in the future, it wouldn't be able to meet this requirement either, since the hash may be unavailable.
In response to https://github.com/w3c/webextensions/issues/401#issuecomment-1602972509 :
However, if there's some other client which does NOT calculate the hash in the course of its normal operation, it probably does make sense to allow the API to hash the current content of the file on disk and return that value, which would avoid introducing (potentially unnecessary) hash computations on every download completion, and avoid the complexity of checking to see if the file was modified, etc.
From the original crbug and here, it seems that the intent is to retrieve the hash of the downloaded file. To that end, returning the current hash by design would be undesirable. Clients may have different ways to ascertain integrity on disk (e.g. heuristics such as file size, or a different hash/checksum). Clients that don't already save SHA-256 hashes can rely on these alternative internal integrity checks to determine whether the file on disk is still the same, and if so, compute the SHA-256 hash on the fly (and reject with an error otherwise). Clients that do not have any of these integrity heuristics could either return the current hash or reject with an error.
If the source of the hash (at download time / at runtime plus integrity check / at runtime without integrity check) is important to extensions, then it may be worth adding another optional option to enable extensions to select the reliability of the hash. Alternatively, getFileHash
could return a dictionary with the hash plus a description of the hash's reliability.
In response to https://github.com/w3c/webextensions/issues/401#issuecomment-1906342517:
Accept only "SHA-256". Not "sha-256", "Sha-256", etc. More than one value is not strictly necessary, but would only increase the implementation complexity by having to normalize.
I proposed case-insensitive match to follow similar precedent set by Subtle Crypto API (digest()
method). Specifically, Web Cryptography API section 18.4.4. "Normalizing an algorithm" step 5 specifies "case-insensitive string match".
In my patch, implementation complexity is just upper-casing that string, and adding a few test cases. Most complexity actually resides in extra tests.
What is the Chrome's opinion on this?
Return the hex-encoded string value instead of an
ArrayBuffer
of the digest. The two can trivially be converted to each other, but I expect the string value to be more ergonomic for extension developers.
No strong opinion here, I just used ArrayBuffer
to match Subtle Crypto API digest()
method. Also, I just like the binary formats over their hex encoding. Could we get official opinion from Chrome?
I wouldn't mind the state to be saved when available (and then consistently be available), but I not guarantee its availability in the API when the file is absent. If a hash algo other than SHA-256 were to be introduced in the future, it wouldn't be able to meet this requirement either, since the hash may be unavailable.
Yes, there will be many cases when the hash is not available. Of course, I did not mean to imply that the browser should calculate the hash after saving the file (bad for performance, privacy concerns of extension detecting that the file was modified after download completed, etc.). I only suggest that the browser can calculate multiple hashes at once (just hash input stream with multiple algorithms before saving it to disk and then saving multiple hashes). The choice of desired hashes could be later specified by a separate API (for a sake of argument, downloads.setFileHashAlgorithms(['SHA-256', 'SHA-512'])
).
In response to https://github.com/w3c/webextensions/issues/401#issuecomment-1906342517:
Accept only "SHA-256". Not "sha-256", "Sha-256", etc. More than one value is not strictly necessary, but would only increase the implementation complexity by having to normalize.
In my patch, implementation complexity is just upper-casing that string, and adding a few test cases. Most complexity actually resides in extra tests.
... which shows that there is a non-zero cost with no extra benefit to supporting case-insensitive algo names. The usual way of declaring supported values in an extension API is through an enum. That enum can then be used for feature detection (e.g. browser.downloads.FILE_HASH_ALGO?.SHA256
) if desired.
I wouldn't mind the state to be saved when available (and then consistently be available), but I not guarantee its availability in the API when the file is absent. If a hash algo other than SHA-256 were to be introduced in the future, it wouldn't be able to meet this requirement either, since the hash may be unavailable.
Yes, there will be many cases when the hash is not available. Of course, I did not mean to imply that the browser should calculate the hash after saving the file
I agree that the hash should not be available until after download completion. I interpreted your point 5 as a requirement to have the hash computed and persisted at download completion. The rest of your latest response emphasizes this aspect again, that you would like to precompute all hashes upfront.
I only suggest that the browser can calculate multiple hashes at once (just hash input stream with multiple algorithms before saving it to disk and then saving multiple hashes). The choice of desired hashes could be later specified by a separate API (for a sake of argument,
downloads.setFileHashAlgorithms(['SHA-256', 'SHA-512'])
).
As described at the end of my last comment, it is not necessary to precompute all hashes upfront IF the API does not guarantee the availability of the hash when the file is unavailable. An implementation that wishes to return the hash using a previously unsupported algo could do so by reading the file, and compute the new hash and the (internally available) sha-256 hash (or whatever also is used). If the computed sha-256 hash matches the internally known hash, then we can assume the integrity of the file to be intact, and that the other hash is therefore also a representation of the downloaded data.
@Rob--W, @oliverdunk is there an update or timeline for this item?
@Rob--W, @oliverdunk is there an update or timeline for this item?
Hi David, this is being worked on by an external contributor (@bershanskiy). There is currently an open PR but there is some Chrome specific work that needs to be done before I can get approval on the Chrome side and merge the PR. My understanding is that Anton has limited capacity right now so I expect it might not move forward in the short term unfortunately.
Hi @oliverdunk, thanks for the update. It seems like the PR @bershanskiy contributed has been in approved status for a few months now. Can you detail what is currently missing or estimate when you expect it to be ready?
Sure - while the PR has approval from Safari and Firefox, there isn't an approval from Chrome right now. Since Chrome is listed as the sponsoring browser (the browser that would drive implementation) that is something we need before we can merge it.
On the Chrome side, we'd like to get privacy and security approval for this API before we commit to accepting a patch and approve the PR. That requires a Chrome-specific API proposal document.
As mentioned, Anton has limited capacity right now so I don't expect this to move forward in the short term unfortunately. I don't want to say more than that since I don't want to speak on his behalf 🙂
Hi @oliverdunk - If it helps, we need it for a force installed extension in our organization. Maybe you can make this API available for those only while you await further approval
@ericlaw1979 opened an issue on crbug about exposing SHA256 of DownloadItem. This is a fairly trivial change in chromium as the value is already calculated and stored, but were unclear if adding this functionality would be burdensome elsewhere and want to check that before discussing more internally if we want to do this.