A parallelizable secretstream_parallel_*() API?

fhajji commented 6 years ago

Hello,

the current secretstream_*() algorithm is inherently sequential, because it uses previous mac, and therefore all previous messages, to update the nonce n. According to the documentation:

c, mac <- ChaCha20Poly1305-IETF-XOR(key = k, nonce = i || n, msg = T || {0} * 63 || M)
n <- n ^ mac
i <- (i + 1) & 0xffffffff
if i = 0:
  rekey()

This CBC-like nonce-construction is just fine in the general case, but unfortunately, like all chained modes, it can't be parallelized.

Some special cases would greatly benefit from a parallel construction, e.g. when all messages and added data have the same length and rekey-ing is not used.

A typical use-case of a parallelized AEAD streaming construction would be encrypting large files / partitions... where random access to specific blocks is required. I'm thinking here of a potential AES-XTS replacement, with added authentication macs.

A parallel algorithm would also run (much?) faster with multiple threads, if multiple messages could be mac-encrypted in parallel ahead of time (e.g. in scatter-gather buffers), and then sent later en-bloc.

So what is the rationale behind n <- n ^ mac ? Blinding a nonce? What parallelizable constructions would be provably secure in this case?

Should libsodium expose an API for this, so users won't make mistakes with their own home-brew constructions? Something like this for encrypting:

int crypto_secretstream_parallel_xchacha20poly1305_init_push
   (crypto_secretstream_xchacha20poly1305_state *state,
    unsigned char header[crypto_secretstream_xchacha20poly1305_HEADERBYTES],
    const unsigned char k[crypto_secretstream_xchacha20poly1305_KEYBYTES],
    unsigned long long message_size /* fixed message size */,
    unsigned long long ad_size /* fixed additional data size */);

int crypto_secretstream_parallel_xchacha20poly1305_push
   (crypto_secretstream_xchacha20poly1305_state *state,
    unsigned long long *message_ctr,
    unsigned char *c,
    const unsigned char *m,
    const unsigned char *ad,
    unsigned char tag);

Same for decrypting:

int crypto_secretstream_parallel_xchacha20poly1305_init_pull
   (crypto_secretstream_xchacha20poly1305_state *state,
    const unsigned char header[crypto_secretstream_xchacha20poly1305_HEADERBYTES],
    const unsigned char k[crypto_secretstream_xchacha20poly1305_KEYBYTES],
    unsigned long long message_size /* fixed message size */,
    unsigned long long ad_size /* fixed additional data size */);

int crypto_secretstream_parallel_xchacha20poly1305_pull
   (crypto_secretstream_xchacha20poly1305_state *state,
    unsigned long long *message_ctr,
    unsigned char *m, unsigned char *tag_p,
    const unsigned char *c, 
    const unsigned char *ad);

Here, *message_ctr may (or may not?) be updated by the API...

Thanks.

P.S.: I'm aware that random access runs contrary to streaming, and a streaming API is under no obligation to be parallelizable. It would merely be nice-to-have.

P.P.S.: Fixing message and added data size may not even be required -- complex protocols may guess ahead of time the message boundaries even when those are variable, and still parallelize processing. It is just easier and less error-prone with fixed message boundaries.

enkore commented 6 years ago

So what is the rationale behind n <- n ^ mac?

To prevent reordering / removing chunks.

fhajji commented 6 years ago

To prevent reordering / removing chunks.

Wouldn't that be detected anyway within the first 2^32 messages? Even it we merely used i || H(key, iv) as the nonce? After all, it is not the application that is submitting / providing the nonce... So if chunks are reordered, inserted, or dropped on the stream, the internal state (i, nonce) would be out-of-sync anyway. Or I am missing something?

Edit: reordering still possible here. Please disregard this comment.

fhajji commented 6 years ago

Or, just to be on the safe side, what about i || H(i || iv || key) as the nonce?

Hashing i alongside key makes it very hard for an attacker without knowledge of key to generate a nonce for a different i (different position). Including iv in the hash, makes the nonce sequence even non-deterministic on the key and i alone.

jedisct1 commented 6 years ago

So what is the rationale behind n <- n ^ mac ? Blinding a nonce? What parallelizable constructions would be provably secure in this case?

In case of nonce reuse, the ciphertext will only be identical up to the common prefix.

The parallel construction is identical, with the exception of this operation not being performed.

These constructions were designed to be generic, and are not tied to a specific implementation or to a specific AEAD primitive.

The parallel version (as well as parallel variants of other operations) is unlikely to be added to libsodium.

This would make the API significantly bigger, while still leaving a lot of work left to the application (threads creation / synchronization), thus diverging too much from the initial purpose of the library.

jedisct1 commented 6 years ago

https://github.com/jedisct1/blobcrypt was designed for random access to encrypted data.

Its API was eventually meant to be included in libsodium.

But a year later, nobody had used it.

So, the simpler secretstream construction was merged instead.

jedisct1 commented 6 years ago

Some special cases would greatly benefit from a parallel construction, e.g. when all messages and added data have the same length and rekey-ing is not used.

This is another reason why it won't be implemented in libsodium.

Some special cases.

Adding APIs to support some special cases is something we try to avoid. For one set of APIs added for special cases, this causes confusion for everybody else.

fhajji commented 6 years ago

Thank you for replying.

So blobcrypt allows encrypting / decrypting arbitrary chunks, not even limited to chunk boundaries? Neat. Very neat indeed. Thanks for the hint.

In case of nonce reuse, the ciphertext will only be identical up to the common prefix.

The parallel construction is identical, with the exception of this operation not being performed.

These constructions were designed to be generic, and are not tied to a specific implementation or to a specific AEAD primitive.

Just to clarify: removing the update of n in the secretstream algorithm won't harm security, as long as the total number of messages would be less than 2^32?

If that's the case, would updating n with H(i || iv || key) in order to extend the nonce be as secure as extending it with with n ^ mac, except that it would be parallelizable? Of course, that would be slower (due to H() overhead)... so not yet the real solution.

Are there some papers analyzing these constructions? I'd really love to see a security proof, and how to extend the nonce securely without chaining. :-)

TIA

tarcieri commented 6 years ago

I'd suggest having a look at this paper. Rogaway tackled this problem, complete with concrete problem definitions and security proofs:

https://eprint.iacr.org/2015/189.pdf

Namely the STREAM construction shows what we want in a scheme that uses a nonce-based defense to reordering/truncation attacks: a user-supplied portion, a counter, and a 1-bit last block flag. This achieves a security definition he called nonce-based online authenticated encryption, or nOAE.

The other construction described in that paper, CHAIN, requires an MRAE cipher for its security properties to hold, and probably isn't applicable in libsodium in general. It also ensures in-order decryption and therefore is inherently not seekable/parallelizable, which seems to be the property desired here.

There's an alternative to the 1-bit last block flag: encrypt a zero-length message to indicate the end of a stream. This is slightly simpler than leaning on the nonce for this purpose, and should also be unambiguous (unless, for whatever reason, you want zero-length messages midstream).

fhajji commented 6 years ago

Thank you @tarcieri, Rogaway's paper was exactly what I was looking for.

I'll look into it and will see how to modify the nonce generation algorithm in a (hopefully) secure way, such that we get seekable / parallelizable streams.

Unfortunately, secretstream_*() API has already committed itself to the non-seekable variant. Since library users already rely on that format (e.g. they may have already encrypted files with it), changing the algorithm without modifying or extending the API is not an option anymore... even if @jedisct1 and the community could be convinced that seekability / parallelizability of the stream is a good and useful idea. So, if at all, it will be as an experimental extension to libsodium.

Now, reading Rogaway (always a pleasure).

Thanks again.

jedisct1 / libsodium

A parallelizable secretstream_parallel_*() API? #719