Bug 27755 - Using the Subtle Crypto Interface with Streams

mwatson2 commented 8 years ago

Though the StreamsAPI is referenced in Informative Reference, the functions under window.crypto.subtle are specified with only one-shot data inputs.

Use-cases: Data may not be available at once. Data may be too huge to keep in memory.

For encrypt()/decrypt() it would make sense to have a streaming readable output if the input is a readable stream.

dlongley commented 2 years ago

@leonbotros,

Currently there is no way to construct a fast AEAD without having the message fully in memory.

My understanding is that there aren't any standard streaming AEADs available (yet). By this I mean that current standard constructions are such that authentication tags can only be checked after processing all data.

You can certainly build encryption/decryption streaming APIs that use current AEAD schemes but they lend themselves to abuse. Specifically, a very large amount of actually inauthentic data will be processed before checking the authentication tag at the very end of the stream. Again, is this because checking at the end is how presently standardized AEAD crypto works.

My understanding is that this sort of attack vector (coupled with using AEAD being best practice) was a reason why a streaming API for encryption/decryption for the Web was not pursued previously. To guard against this threat, data must be broken into smaller chunks (and, typically, you'd want these to fit into memory easily), each chunk having its own authentication tag.

Some links: https://crypto.stackexchange.com/a/54547/8459 https://crypto.stackexchange.com/a/24882/8459 Finalists from the CAESAR competition: https://competitions.cr.yp.to/caesar-submissions.html

Maybe this threat is no longer considered reason enough to prevent introducing an encryption/decryption streaming API.

Regardless, a streaming API for producing a message digest seems ok.

jasnell commented 2 years ago

No documentation yet but yes, you've got the example correct. (Currently fixing a bug in our implementation that should go out with next week's release.)

The constructor for crypto.DigestStream takes the same algorithm identifier argument as crypto.subtle.digest(), so you can do either new crypto.DigestStream('SHA-256') or new crypto.DigestStream({ name: 'SHA-256' }).

The implementation that I've put together for Node.js also accepts a second parameter that is a QueuingStrategy from the WHATWG Streams specification. (We don't use it in the Cloudflare implementation so you can pass it but it'll be ignored).

The idea for the API is to be compatible with WHATWG Streams and consistent with Web Crypto.

I am also looking at crypto.SignStream and crypto.VerifyStream that follow the same pattern, e.g.

const key = await crypto.subtle.importKey(...);
const signStream = new crypto.SignStream({ name: 'HMAC', hash: 'SHA-256' }, key);
const writer = signStream.getWriter();
await writer.write('hello');
await writer.close();
const signature = await signStream.signature;

const verifyStream = new crypto.VerifyStream({ name: 'HMAC', hash: 'SHA-256' }, key, signature);
const writer = verifyStream.getWriter();
await writer.write('hello');
await writer.close();
if (await verifyStream.verified) console.log('OK!');

As mentioned, EncryptStream and DecryptStream are much more complicated. I'd like to get some conversation going around those -- most specifically on what set of cipher algorithms make the most sense.

twiss commented 2 years ago

@jasnell Thanks! Makes sense. If we specify something like this, would you also be OK with the other two API proposals mentioned in my previous comment? Or do you have some opinions about them?

I think EncryptStream could, in theory, be pretty straightforward; e.g. it could subclass TransformStream and implement the symmetric algorithms (AES-CTR, AES-CBC and AES-GCM).

I think DecryptStream is much more fraught, due to the issue @dlongley brought up, i.e. that for AEAD algorithms (for now just AES-GCM), we should in principle only release the plaintext after the authentication tag has been verified. AES-CTR and AES-CBC could be streamed, but I think it's not ideal to have those algorithms but not AES-GCM, as it could encourage people to use unauthenticated encryption.

Though, if we don't want DecryptStream, perhaps we don't want EncryptStream either, and instead encourage people to manually chunk the plaintext and encrypt each chunk, so that it can be decrypted the same way.

All of that being said, perhaps if we place both EncryptStream and DecryptStream under subtle, with big warnings not to process unauthenticated plaintexts or show them to the user before the authentication tag has been verified / the stream has been completed, it might be OK. I think there are still some legitimate use-cases for this, for example streaming the result to a file on disk, and then using the file only after it's complete.

jasnell commented 2 years ago

@twiss:

const { writable, digest } = new crypto.subtle.DigestStream('SHA-256');

I'm not a fan of this largely because it just feels a bit awkward and unnecessary to introduce another intermediate type that is "somewhat closer to (but not quite the same as) a TransformStream" when just returning a new type of WritableStream with an additional field works.

I also do prefer just hanging these off crypto rather than crypto.subtle (with the caveat that they have the same exposure constraints as SubtleCrypto). I think it makes things a bit easier to reason about, e.g. "crypto.subtle stuff is async but covers one-shot, non streaming operations, while crypto.***Stream stuff is for when you need to stream things". But that's likely just a personal stylistic choice.

As for EncryptStream and DecryptStream, those are definitely TransformStream extensions but certainly require additional thought and care. What I suggest is that we split those off into their own effort. Let's get DigestStream, SignStream, and VerifyStream defined and spec'd out first.

MattiasBuelens commented 2 years ago

I also do prefer just hanging these off crypto rather than crypto.subtle (with the caveat that they have the same exposure constraints as SubtleCrypto). I think it makes things a bit easier to reason about, e.g. "crypto.subtle stuff is async but covers one-shot, non streaming operations, while crypto.***Stream stuff is for when you need to stream things". But that's likely just a personal stylistic choice.

That's not what the "subtle" in SubtleCrypto means, though. See this note below the Web IDL definition:

The SubtleCrypto interface is named "SubtleCrypto" to reflect the fact that many of these algorithms have subtle usage requirements in order to provide the required algorithmic security guarantees.

For example, the direct use of an unauthenticated encryption scheme, such as AES in counter mode, gives potential attackers the ability to manipulate bits in the output by manipulating bits in the input, compromising the integrity of the message. However, AES-CTR can be used securely in combination with other cryptographic primitives, such as message authentication codes, to ensure the integrity of the protected message, but only when the message authentication code is constructed over the encrypted message and IV.

Developers making use of the SubtleCrypto interface are expected to be aware of the security concerns associated with both the design and implementation of the various algorithms provided. The raw algorithms are provided in order to allow developers maximum flexibility in implementing a variety of protocols and applications, each of which may represent the composition and security parameters in a unique manner that necessitate the use of the raw algorithms.

Following that logic, these streams should be part of crypto.subtle.

jasnell commented 2 years ago

That's not what the "subtle" in SubtleCrypto means, though.

I know :-) ... like I said, it's probably just a personal stylistic choice, in which case the new classes should go where they best belong and not where James prefers them to be ;-)

twiss commented 2 years ago

I'm not a fan of this largely because it just feels a bit awkward and unnecessary to introduce another intermediate type that is "somewhat closer to (but not quite the same as) a TransformStream" when just returning a new type of WritableStream with an additional field works.

I agree, if it's just DigestStream - but SignStream and VerifyStream follow the same pattern, as you said, and so perhaps having a type for "stream -> single value", that all of those inherit from, might make sense? And e.g. name the other property result, so that you can always do

const { writable, result } = new *Stream(...);

If this pattern is also applicable outside of the Web Crypto spec, it might even make sense to allow such streams to be user-defined, e.g. (names up for debate, of course)

new ProcessStream({
  start(controller) {}
  process(chunk, controller) {}
  flush(controller) { controller.setResult(...); }
});

Though this is more something for the Streams spec, and I'm not sure whether there are in fact enough other applications of this to justify it.

All of that being said, arguably the type for "stream -> single value" is async (stream) -> value, at which point we're kind of back to @jakearchibald's original proposal; though providing a WritableStream rather than accepting a ReadableStream does have some advantages, I think.

The SubtleCrypto interface is named "SubtleCrypto" to reflect the fact that many of these algorithms have subtle usage requirements in order to provide the required algorithmic security guarantees. For example, (...)

Not to belabor the point, but as another example of this that's relevant here, SHA-1 and RSASSA-PKCS1-v1_5 are essentially only included for compatibility reasons, and should be used with care (if at all), and are also "subtle" in that sense.

dead-claudia commented 2 years ago

@twiss

though providing a WritableStream rather than accepting a ReadableStream does have some advantages, I think.

Nit: you could use a writable stream internally that just happens to resolve the promise from inside on end. That would mitigate most advantages in that respect.

jasnell commented 2 years ago

I agree, if it's just DigestStream - but SignStream and VerifyStream follow the same pattern, as you said, and so perhaps having a type for "stream -> single value", that all of those inherit from, might make sense? And e.g. name the other property result, so that you can always do

I'm still not convinced it's any better in the end. I'm fine if we have each of DigestStream/SignStream/VerifyStream all have a property named result tho (as opposed to separate names for each)... but I just don't see any other advantages in the API you're suggesting.

const stream = new crypto.subtle.DigestStream('SHA-256')
// write
await stream.result;

const { stream, result ) = new crypto.subtle.DigestStream('SHA-256')
// write
await result;

It's just not a significant enough of a difference to move the needle for me and I prefer the former.

twiss commented 2 years ago

Nit: you could use a writable stream internally that just happens to resolve the promise from inside on end. That would mitigate most advantages in that respect.

I meant in terms of ease of use of the public API - but of course even there a WritableStream can be converted to a ReadableStream by using a TransformStream, so it doesn't matter that much in the end.

I'm still not convinced it's any better in the end. I'm fine if we have each of DigestStream/SignStream/VerifyStream all have a property named result tho (as opposed to separate names for each)... but I just don't see any other advantages in the API you're suggesting.

IMHO it's a bit cleaner to separate them - the result is not a property of the writable stream, in a sense. And in your API, the digestStream also has a locked property; digestStream.writable.locked seems clearer / more explicit. But OK, of course these are very minor details, both are probably fine.

lucacasonato commented 2 years ago

At least for digests, this is all very complicated. Could I propose we consider crypto.subtle.digest accepting BufferSource | Iterator<BufferSource> | AsyncIterator<BufferSource> instead of only the BufferSource we do now? This maps well to the "streaming variable size input, single fixed output" nature of digests.

The above proposal would integrate well with both ReadableStream (as they are async iterables), Node streams (they are too), and also just regular JS async generator functions.

A TransformStream style stream (like TextEncoderStream) does not work well for this scenario, because of the single output, and a subclassed WritableStream also seems not super ideal. Passing a ReadableStream to a function which it reads from directly is significantly faster than piping it through another WritableStream. If a user wanted a WritableStream they could create a TransformStream with no transform.

This is the pattern we have to support streaming digests in Deno's standard library: https://github.com/denoland/deno_std/blob/main/crypto/mod.ts#L164-L220.

This could additionally work well for synchronously available "chunked" data through the support for Iterator<BufferSource>. Hashing multiple byte arrays together is a rather common operation (for example to calculate request signatures), and currently requires concatenating the buffers into a new buffer.

This should be significantly simpler to both specify and implement compared to adding a new stream type.

I realize this has been discussed above, but I feel we are really drifting into the overly complicated territory. We should try to consider the simpler options before attempting very complicated solutions like custom stream types described by @twiss and @jasnell above.

dead-claudia commented 2 years ago

100% agreed @lucacasonato and just to reiterate, there's no real advantage of offering a TransformStream over accepting a ReadableStream.

I'd be perfectly content with a digest = await crypto.subtle.digest(source), though I would strongly prefer implementations to be allowed to handle ReadableStreams specially internally so they can avoid the (pointless) overhead of promises. (Also, existing API precedent, like with Fetch's body parameter for both RequestInit and Response's constructor, explicitly supports that.)

lucacasonato commented 2 years ago

Special casing ReadableStream seems totally reasonable to me.

twiss commented 2 years ago

@lucacasonato Fair enough. Part of the reason I liked the idea of having a separate object / type is that detection of support for streaming hashing (and creating polyfills) is slightly easier. But ok, I suppose having a try/catch is not that bad either, and I agree it's a more straightforward solution.

@jasnell Would you be happy with that solution as well?

@isiahmeadows I agree that implementations should be able to optimize handling ReadableStreams, but I'm not sure the spec necessarily needs to account for that, as the result should be indistinguishable? I also don't see what you mean re. the fetch spec; I see it accepts ReadableStreams, but it doesn't accept [Async]Iterator<BufferSource>, so it doesn't seem like the same scenario?

lucacasonato commented 2 years ago

but I'm not sure the spec necessarily needs to account for that, as the result should be indistinguishable?

We unfortunately do, because the iteration protocol is observable to the user (they could add a custom iteration function to an existing ReadableStream). By using a brand check to explicitly target ReadableStream differently, we could circumvent this observableness somewhat.

TBH, I really do not think this makes any sort of performance difference though, as the async iteration would introduce at most one more promise per turn (it might actually even be the same).

twiss commented 2 years ago

Right. If it does make a difference for performance, the implementation could check whether [Symbol.asyncIterator] is the built-in one. Deoptimizing the case where it isn't seems fine, as it's similar to passing a custom AsyncIterator in the first place.

jfbrennan commented 2 years ago

Came here and was pleasantly surprised to see @feross's 5 year old suggestion above (exactly what I need in the browser) and that it has so many positive reactions https://github.com/w3c/webcrypto/issues/73#issuecomment-221155330

Is that no longer in scope?

Lots of years of discussion here...perhaps what I'm seeing is the deeper convo about how to support such an API. If so, yay! But apparently still far from getting shipped.

If not, this seems like a really basic use case. For example, before I can upload to my storage service, I need to generate a hash hex from the digest, but the digest can't be created in one go because the file data comes in chunks (our users have gigabyte-sized files to upload, so the files get chunked in order to avoid hitting in-memory limits).

tniessen commented 2 years ago

FYI, there is ongoing work in https://github.com/wintercg/proposal-webcrypto-streams, which is managed by the @WinterCG.

A first explainer exists and we welcome feedback through GitHub issues in that repository.

simonhaenisch commented 2 years ago

I came here because I needed to generate a hash (for etag header) via streaming inside a Cloudflare worker and thanks to the mention of the non-standard crypto.DigestStream I got it working... couldn't find any docs yet, so here's a Cloudflare worker example for streaming some result into KV, and generating an etag along with it:

interface Env {
  KV: KVNamespace;
}

const onFetch: ExportedHandlerFetchHandler<Env> = async (request, env): Promise<Response> => {
  const encoder = new TextEncoder();

  const etagStream = new crypto.DigestStream('SHA-1');
  const { readable, writable } = new TransformStream();

  const etagWriter = etagStream.getWriter();
  const resultWriter = writable.getWriter();

  // not awaiting this because we'll start streaming into
  // the writable side of the transform stream after
  env.KV.put('result', readable);

  // imagine some utility function that reads a ReadableStream,
  // decoding each chunk and passing the data to the callback
  await processStream(request.body, (data: string) => {
    // ... do whatever with the data here

    const encoded = encoder.encode(data);

    etagWriter.write(encoded);
    resultWriter.write(encoded);
  }

  await Promise.all([etagWriter.close(), resultWriter.close()]);

  const etagBuffer = await etagStream.digest;

  const etag = Array.from(new Uint8Array(etagBuffer))
    .map((byte) => byte.toString(16).padStart(2, '0'))
    .join('');

  await env.KV.put('etag', etag);

  return new Response('done');
}

const worker: ExportedHandler<Env> = {
    fetch: onFetch,
};

export default worker;

fabiospampinato commented 1 year ago

This seems a huge limitation, making hashes should go at the speed of light, instead the WebCrypto API seems almost designed to force you to write slow code (no sync method for tiny inputs, no streams support for huge inputs).

Zectbumo commented 1 year ago

Since this API hasn't been completed yet I would like to point out that saving state would be useful. The web is transient in nature and would benefit from being able to save the state. For example the Python hashlib api did not provide saving the state which resulted in people reverting to C. https://stackoverflow.com/questions/2130892/persisting-hashlib-state

devgs commented 1 year ago

What a shame. Almost a decade without any progress on such a no-brainer feature.

leoselig commented 1 year ago

What a shame. Almost a decade without any progress on such a no-brainer feature.

A quick search revealed that there is indeed an active draft exploring this: https://webcrypto-streams.proposal.wintercg.org/#encryptionstream

I do not support the WhatWG in any meaningful way but since you are rather rudely criticizing there lack of progress in their non-funded pursuit, I assume you do know more here. Would love to get some insights on why you think this is simpler to do than it looks!

vszakd commented 8 months ago

+1

gobengo commented 1 month ago

@jasnell @tniessen wdyt of adding DigestStream to https://github.com/wintercg/proposal-webcrypto-streams ?

w3c / webcrypto

Bug 27755 - Using the Subtle Crypto Interface with Streams #73