w3c / webcrypto

The W3C Web Cryptography API
https://w3c.github.io/webcrypto/
Other
265 stars 53 forks source link

Bug 27755 - Using the Subtle Crypto Interface with Streams #73

Open mwatson2 opened 8 years ago

mwatson2 commented 8 years ago

Bug 27755:

Though the StreamsAPI is referenced in Informative Reference, the functions under window.crypto.subtle are specified with only one-shot data inputs.

Use-cases: Data may not be available at once. Data may be too huge to keep in memory.

For encrypt()/decrypt() it would make sense to have a streaming readable output if the input is a readable stream.

jimsch commented 8 years ago

After listening to Ryan rage about the use of BER encoding for ASN.1 objects, I have a feeling that this should be closed as won't fix because it presents a security issue. When one looks at the encrypt/decrypt APIs for authenticated encryption, it is required that the entire stream be observed on the decrypt side and could be argued that it needs to be observed on the encrypt side prior to emitting the processed stream. This is due to the fact that if the decryption process does not validate then no output is to be produced for consumption. Allowing this to be done in a streaming fashion means that the browser potentially needs to have an infinite size buffer to hold the intermediate result to be returned to the client.

Similar issues hold for processing of signature values for the new X448 EdDSA algorithm where the message M is hashed twice. Allowing for an indefinite length input means that there are potential buffer overrun problems.

feross commented 8 years ago

Node.js has a streaming crypto API without any security issues:

const crypto = require('crypto');
const hash = crypto.createHash('sha256');

hash.update('some data to hash');
hash.update('more data');
hash.update('even more data');
console.log(hash.digest('hex'));

Why can't the web platform?

indutny commented 8 years ago

I absolutely agree with @feross on this. Most (if not all) of the APIs can work in a streaming mode without any security issues. In fact, this is how these APIs are exposed in OpenSSL, so they always work in a streaming mode under the hood anyway, regardless of what high-level API may look like.

jimsch commented 8 years ago

All of the current hash functions that I am familiar with will allow for streaming APIs because they are built using a Merkle–Damgård construction. This means that they are processed on a block by block basis. However there are algorithms for which this is not doable. For example, the EdDSA algorithm that I mentioned above computes:

R = fn( SHAKE256(dom(F, C) || prefix || M, 114) ) and then k = SHAKE256(dom(F, C) || R || A || M, 114)

as you can see, you need to all of the message M to compute R before you can start doing the computation of k. This means that the entire message needs to be buffered unlike the hash example you gave above.

Note also the comment that I made on authenticated decryption where the entire message needs to be kept before doing the validation step at the end.

indutny commented 8 years ago

@jimsch in your description SHAKE256 appears to be just a hashing function, most of the hashing functions support streaming input. There is nothing that could prevent one from creating two streaming SHAKE256 hashes and using their digests at the end of the stream to compute R and k.

Authenticated decryption should work as well, as far as I can tell... Though, the fact that the integrity is checked only at the end of decryption process means that the API will be kind of awkward. I don't think that there are much pros of using streams for authenticated decryption.

jimsch commented 8 years ago

@indutny please re-read my previous post and look at the requirements to finish R before using M for k

indutny commented 8 years ago

@jimsch oh I see it now. Sorry about that! Yeah, streaming won't work for this kind of encryption/decryption schemes indeed.

Still many hashes and ciphers work just fine with streams.

tanx commented 8 years ago

A native streaming api would indeed be great. Our use-case would be large file encryption in OpenPGP.js.

mwatson2 commented 8 years ago

If we address this, I think it will not be in this version since it requires substantial work.

hhalpin commented 8 years ago

I imagine we can close this as won't fix, but when streaming stabilizes we can then revisit as part of maintenance of the spec since as @jimsch correctly points out, it won't work for quite a few algorithms. We could also try to test to see if anyone supports streaming - any ideas?

hhalpin commented 8 years ago

v.Next.

evilaliv3 commented 7 years ago

Is there any update on this topic?

roccomuso commented 7 years ago

+1

ericmackrodt commented 7 years ago

If streaming/progressive encryption isn't implemented, it's going to hugely limit the scope of usage of the API. I really need that kind of functionality for the software I work on.

neckaros commented 7 years ago

+1!

neckaros commented 7 years ago

Privacy is a groing concern. Being able to decrypt locally without consuming too much memory is a must i think. For exemple encrypt huge file locally as you send it to a server so the server never has the decrypted data. It works well on nodejs

alanwaketan commented 7 years ago

I think digest maybe a good point to start with.

thiccar commented 7 years ago

+1000

daviddias commented 7 years ago

Hi all, bringing this issue back up. Any updates or recent discussion on it?

I believe that the security considerations do not hold and it what it promotes is for users to find other ways to encrypt their files as the usage of browsers to share large documents grows. Possibly by having to shim their own encryption streaming API which will be considerably slower than a native one through WebCrypto.

JulianKlug commented 6 years ago

+1

johnozbay commented 6 years ago

100% agreed with @ericmackrodt & @neckaros & @diasdavid. With GDPR on the horizon this would make things a lot more easier for European establishments.

dead-claudia commented 6 years ago

@jimsch By any chance, could a streaming API be provided for those encryption schemes that could be streamed? Just because it's not possible for some doesn't make it impossible for all (and there's different tradeoffs for each). And one good example of this is with client decryption of large files on mobile (only high end phones/tablets have the RAM available to reliably decrypt a 750MB video download in-memory).

jimsch commented 6 years ago

It could, on the other hand there may be other things that could be done as well. For example one could do chunked encryption of large objects such as video which is designed to be streamed so that each chunk can be independently decrypted and streamed. The world is moving towards only using authenticated encrypted algorithms and doing streaming such as you suggest means that you are willing to use a decrypted stream that may have been corrupted w/o being able to detect this.

Additionally, one would need to get a group of people together at the W3C who are interested in doing an update to the document and then decide which algorithms could/should be streamable and which should not.

lll000111 commented 6 years ago

@jimsch My use case — and I think this is a bit more common — is calculating hashes over large files being exchanged. "You can use chunks" is not a good solution, especially given that there is no problem with calculating e.g. a SHA-256 of a streamed file. I'm on the fence about encryption (i.e. about encrypting the chunks instead), but hashes that would work on streams should work on streams.

In my own app I link content through SHA-256 hashes, so I need hashes to be of the full file. No use having chunk-hashes. If I have to do that in memory with large files... the whole point of streams and the big movement towards them is that I can save on memory foot print.

antonin-arquey commented 6 years ago

The digest method should absolutely support hashing large files in streaming mode. You just can't upload 3GB files in memory to hash them in a single block. Right now if you have this use case you are forced to look into other library options.

acdha commented 6 years ago

Just to second @antonin-arquey's comment, I'm working on a project which does large file deliveries entirely client side generating SHA-256 manifests to provide strong assurances against bitrot along the way. Our current approach is using asmCrypto.js in a web worker, which adds a fair amount of code and has varying performance (50-90MB/s depending on the browser on my dev system).

I'd like to switch to Subtle Crypto and jettison that dependency but the digest interface doesn't support streaming and we often have files which are at least single-digit GB range.

kaizhu256 commented 6 years ago

@isiahmeadows, you can achieve defacto-streaming by using HLS video-format. here's a demo that decrypts-and-plays HLS video-chunks using webcrypto aes-256-cbc:

https://kaizhu256.github.io/node-demo-hls-encrypted/index.html

and here's the ~100 sloc hack to get it to work, by injecting webcrypto decryption right after the ajax-call:

https://github.com/kaizhu256/node-demo-hls-encrypted/commit/59283c3a879369f7d6f2404a9a28c2eafa868555#diff-0af25116316f4a4c5abd8574f62beff2

image

neckaros commented 6 years ago

Thanks for the tip. personally and unfortunately I'm using gdrive to serve user video so I cannot get HLS. However I ended up using using a transform stream in the service worker:

dvoytenko commented 5 years ago

At least with AES-CTR it's possible to do manual chunking for decryption and provide counter values to decrypt?

v0l commented 5 years ago

Would love to see stream support for subtlecrypto, using one-shot data is very restrictive and in-effecient.

There isnt much value added if people decide to implement this into their browser when its not even possible to stream data.

I would add the same comment to FileReader, I have no idea why FileReader doesnt expose a ReadableStream...

Is there any plan to implement this in the future? I have a site using SubtleCrypto and FileReader and its very limited in how much data people can process, maybe <100MiB in Chrome (Firefox seems to work better) before crashing.

jakearchibald commented 5 years ago

@v0l

I would add the same comment to FileReader, I have no idea why FileReader doesnt expose a ReadableStream...

fwiw you can do new Response(blob).body.

jakearchibald commented 5 years ago

Writable and transform streams have now shipped in Chrome. Pretty sure the streaming spec is stable enough to look at this.

I agree with others that the digest method is a good place to start.

jakearchibald commented 5 years ago

Suggested API:

const value = await crypto.subtle.digest(algorithm, readableStream);

Overload the existing method so it takes a readable stream. Example:

// Digest the HTML spec:
const request = await fetch('https://html.spec.whatwg.org/');
const value = await crypto.subtle.digest('SHA-256', request.body);

This would also allow providing the chunks manually, or using a combination of many sources:

// Digest a combination of the HTML spec, the DOM spec, and "That's all folks".
const responsesToDigest = [
  fetch('https://html.spec.whatwg.org/'),
  fetch('https://dom.spec.whatwg.org/'),
];
const { writable, readable } = new TransformStream();
const valuePromise = crypto.subtle.digest('SHA-256', readable);

for await (const response of responsesToDigest) {
  await response.body.pipeTo(writable, { preventClose: true });
}

const writer = writable.getWriter();
writer.write(new TextEncoder().encode("That's all folks"));
writer.close();

const value = await valuePromise;
gannons commented 5 years ago

It sounds like this feature has yet to be implemented. Are there any alternatives to using webcrypto?

jimmywarting commented 5 years ago

Similar issues hold for processing of signature values for the new X448 EdDSA algorithm where the message M is hashed twice

If that's the case why can't we just simply allow a Blob or File to be passed onto the digest function and let the hashing be in control of reading & seeking the content? Why would you need to create a writeable stream or a single buffer at all out of a blob?

I don't argue against streaming support. It's a grate addition for things where hashing algorithms works in a streamable (block) fashion.


For feature references you can get a stream from a blob using the newly added read method by just calling blob.stream() instead of doing new Response(blob).body


zip.js created a neat universal base class that had some basic read/write functionality that could be added onto any kind of data (blob, string, base64, typedArrays) as long as it had this kind of method:

class Something extends zip.Reader {
  #data

  constructor() { ... }

  readUint8Array (start, length) {
    return slice_and_return_uint8(this.#data, start, length) // could also return a promise
  }
}

If we had something like this then digest would not be limited to only a few types of acceptable data. it would be more like stream's pull method but with added arguments of what & where to read. when i came to thing about it, they acts pretty much as a Transform stream.

Imagine using something like this where you had to read M twice...

var transformer = new TransformStream({
  transform({ start, length }, controller) {
    controller.enqueue( slice_and_return_uint8(data, start, length) )
  }
})

crypto.subtle.digest(algorithm, transformer)
rabindranathfv commented 4 years ago

Suggested API:

const value = await crypto.subtle.digest(algorithm, readableStream);

Overload the existing method so it takes a readable stream. Example:

// Digest the HTML spec:
const request = await fetch('https://html.spec.whatwg.org/');
const value = await crypto.subtle.digest('SHA-256', request.body);

This would also allow providing the chunks manually, or using a combination of many sources:

// Digest a combination of the HTML spec, the DOM spec, and "That's all folks".
const responsesToDigest = [
  fetch('https://html.spec.whatwg.org/'),
  fetch('https://dom.spec.whatwg.org/'),
];
const { writable, readable } = new TransformStream();
const valuePromise = crypto.subtle.digest('SHA-256', readable);

for await (const response of responsesToDigest) {
  await response.body.pipeTo(writable, { preventClose: true });
}

const writer = writable.getWriter();
writer.write(new TextEncoder().encode("That's all folks"));
writer.close();

const value = await valuePromise;

@jakearchibald did you try to use this with angular in client side? i am using it and it's is make kboom all my unit test hahaha.

mikeal commented 3 years ago

Is anyone working on this?

sideshowbarker commented 3 years ago

Is anyone working on this?

No one is working on it that I’m aware of.

We need an editor. Not just an editor to lead the spec/architecting decisions about this one feature, but also to look at the 13 other open enhancement requests/proposals we have at https://github.com/w3c/webcrypto/labels/enhancement, as well as the 44 open issues in total that we have https://github.com/w3c/webcrypto/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc that are in need of an editor to triage/evaluate.

So if anybody following this issue would be like to explore the idea of taking responsibility for editing the spec, please let me know, and I can give you more details about the tasks and level of effort that would be expected and needed.

But for now, as far as the basic level-of-effort details: The rule of thumb we typically use is that an editor for a particular spec should be willing and able (that is, have support from their management at their company/org) to put one full day a week of their time, over at least 3 months, into editing the spec: 20% of their work week, ~8 hours a week. And anybody willing to commit to that would get some number of hours of support time from somebody on the W3C staff (e.g., me).

TheUltDev commented 3 years ago

For those needing performant hashing now on the web the best options are WebAssembly or ASM.js for files larger than your target chunk size and use the native crypto function for smaller files.

Web Assembly: https://github.com/Daninet/hash-wasm ASM.js: https://github.com/asmcrypto/asmcrypto.js

Note: I am not affiliated with either of these projects, I do use asmcrypto.js in production though.

I hope to one day have native incremental hashing in the browser, it's quite a hassle.

knightcode commented 3 years ago

Use case I'm interested in would be forming a cipher stream for upload from a File object:

const iv = ...
const add = ...
const key = ...
const file = document.getElementById("file_input").files[0]; // or dataTransfer.files

// maybe: new FormData()

const response = await fetch("...", {
    method: "POST",
    body: window.crypto.subtle.encrypt(
      {name: "AES-GCM", iv: iv, addtionalData: aad, tagLength: 128},
      key,
     file
});

The opposite direction would be nice, too,...somehow attaching subtle.decrypt to an <a download> tag. But that seems less intuitive.

jimmywarting commented 2 years ago

Another solution could be to accept a AsyncIterable that yields uint8array's, in that case both node & whatwg streams could be processed

const request = await fetch('https://html.spec.whatwg.org/');
const value = await crypto.subtle.digest('SHA-256', request.body);
MattiasBuelens commented 2 years ago

Another solution could be to accept a AsyncIterable that yields uint8array's, in that case both node & whatwg streams could be processed

Not sure.

jimmywarting commented 2 years ago

Hmm, good point!

I previous suggested adding blob support here: #216 but if it could accept a ReadableStream from crypto.subtle.digest('SHA-256', blob.stream()) and even bypassing stream api then there wouldn't be much value in adding support for blob...

knightcode commented 2 years ago

Another solution could be to accept a AsyncIterable that yields uint8array's, in that case both node & whatwg streams could be processed

const request = await fetch('https://html.spec.whatwg.org/');
const value = await crypto.subtle.digest('SHA-256', request.body);

Would the user be able to save the streaming file to the file system with this API?

MattiasBuelens commented 2 years ago

Would the user be able to save the streaming file to the file system with this API?

crypto.subtle.digest() accepts a variable-size input but returns a fixed-size output, for example SHA-256 always returns a 256-bit (32 byte) output. So there's no point in making the output a stream.

For encrypt() and decrypt(), the output size is dependent on the input size: larger inputs generate larger outputs. So here, it would make sense to also return a ReadableStream. You'll be able to write this to disk using the File System Access API:

const decryptedStream = crypto.subtle.decrypt(/* ... */); // this API doesn't exist yet
const handle = await window.showSaveFilePicker();
decryptedStream.pipeTo(await handle.createWritable());

But for now, we're only looking at digest().

matthewjumpsoffbuildings commented 2 years ago

I recently have been dealing with workers/fetch and was very disappointed to find that the Crypto API doesnt support Streams.

That implementation you proposed @MattiasBuelens looks fantastic, hopefully something like it comes to fruition

vlovich commented 2 years ago

It might be useful to have streaming be its own thing rather than retrofitting it into individual operations (or we can do both for ergonomics). One design I have requires fast random access of the resultant encrypted output but the input stream can be very big. To accomplish this, I'm simply chunking smaller amounts of input text. This could be implemented in JS land by hooking up the output stream to a custom TransformStream (& maybe that's the right answer), but I wonder if that creates inefficiencies due to the extra intermediary JS arrays that need to be allocated.

Perhaps something like:

type StreamChunkParams = `(AesGcmParams | AesCbcParams | AesCtrParams | HkdfParams | Pbkdf2Params | Algorithm | ..., CryptoKey)[]`
const outputStream = crypto.subtle.stream(request.body, 
  {chunkLength: 16 * 1024, chunkParams: (chunk: number, length: number): StreamChunkParams => { ... }}
)

chunkParams is a callback invoked on every chunk boundary that returns StreamChunkParams which describes the sequence of cryptographic operations to run on top of the previous operation (with the first entry operating on the stream), with the cumulative result written (the length is always chunkLength except for the last chunk). For example, [cbcEncrypt: AesCbcParams, hmacSign: HmacParams] would give you the IV, cipher text, & HMAC(IV + cipher text) as a single contiguous stream of bytes written to the output. We'd have to define what the interaction of operations are maybe restrict the set of valid combinations (e.g. I would start by just having one of the sign/verify/encrypt/decrypt algorithms & only allow the CBC+HMAC combination unless there are others that are frequently combined).

The output stream would just be the concatenated result of all the crypto operations in the given order. Another thing that's common is to compute the digest of a message while reading it. It would suck for the answer to be "tee + stream digest" because an implementation might have to keep a copy of the IO that was dequeued from the kernel to satisfy the tee. So maybe we can specify an additional digest that is stream-wide. Something like:

type StreamChunkParams = `(AesGcmParams | AesCbcParams | AesCtrParams | HkdfParams | Pbkdf2Params | Algorithm | ..., CryptoKey)[]`

const [encryptedStream, digestPromise] = crypto.subtle.stream(request.body, 
  {
    chunkLength: 16 * 1024,
    chunkParams: (chunk: number, length: number): StreamChunkParams => { ... },
    digest: 'SHA-256'
  }
)
encryptedStream.pipeTo(...)
const digest = await digestPromise
taralx commented 2 years ago

Cloudflare just announced availability of a non-standard DigestStream: https://community.cloudflare.com/t/2021-12-10-workers-runtime-release-notes/334982

leonbotros commented 2 years ago

Not supporting incremental hashing and streaming encryption currently is a substantial usability issue of this API.

Currently there is no way to construct a fast AEAD without having the message fully in memory. It is possible to use AES-CTR to build a construction by updating the counter ourselves but this does not cover authentication which has to be performed by a slower fallback. In my case, this leads to a severe asymmetry in time spent encrypting/decrypting and hashing/MAC'ing (roughly factor 10, and this is a rust hash crate compiled to WASM).

As an intermediate step it would help to expose an incremental hash API. Even though hashing is not considered the best choice for constructing fast MACs, having hardware support makes up for this. That way an HMAC can for example be constructed from an incremental SHA256.

twiss commented 2 years ago

@taralx That's also an interesting approach, is there some documentation of this? (Cc @jasnell, perhaps?)

From playing around with it a bit, the API seems to be:

const digestStream = new crypto.DigestStream('SHA-256');

// Pipe or write some data to it, e.g.
const writer = digestStream.getWriter();
writer.write(new TextEncoder().encode('some data'));
writer.close();

const value = await digestStream.digest;

If you'll allow me to speculate on some pros and cons vs the approach @jakearchibald proposed (i.e. overloading the existing crypto.subtle.digest method to accept a ReadableStream), I see:

Pros:

Cons:

Does anyone see any others?

At the risk of getting too much into the weeds of API design, to address the cons at least somewhat, I would propose:

const { writable, digest } = new crypto.subtle.DigestStream('SHA-256');

// Pipe or write some data to `writable`, e.g.
const writer = writable.getWriter();
writer.write(new TextEncoder().encode('some data'));
writer.close();

const value = await digest;

I.e. somewhat closer to (but not quite the same as) a TransformStream. Alternatively, digest could also be named result, so that the same API could be used for e.g. SignStream.

If this seems reasonable to people, I'd be happy to write up a draft spec and propose it to the WICG.

(It seems clear to me that there's a lot of developer interest and at least some implementer interest, given that there is already one implementation. @jakearchibald I know it's been a while but could I infer from your comments that Chrome would also potentially be interested in implementing this?)