katzenpost / docs

specification and design documents
Creative Commons Attribution Share Alike 4.0 International
53 stars 14 forks source link

Sphinx: Collect and reduce cryptographic assumptions, primitives, and constructions. #64

Open WildCryptoFox opened 5 years ago

WildCryptoFox commented 5 years ago

Sphinx asks for several independent hash functions accepting a group element as input and various keys as outputs, for this a KDF is the suitable answer. HKDF-SHA256 is fine. Blake2b would be faster and is already used internally in AEZv5.

MAC. Usually meaning any PRF but in order to prove that an adversary cannot construct a message that cycles through the network endlessly, its security proof models the MAC as a random oracle, so it suggests HMAC-SHA256-128. If this additional constraint can be removed a fast UHF may be used as the MAC because sphinx uses the MAC only once per key. Blake2b would be faster.

PRG. Aes128-Ctr. Good. Sphinx headers are short, if we're using blake2b as our KDF, then we may as well let it expand a little more to cover the keystream we'd use a PRG to derive - without paying for the AES key setup. Would need to benchmark to know when it is worth using aes128-ctr over blake2b.

SPRP. No fancy restrictions. AEZ is fine. However it is worth noting the chosen AEZv5 is based on heuristic security and does not reduce to AES. HCTR does reduce to its primitives; but it is slower than AEZ.

Sphinx assumes Decisional Diffie-Hellman is hard over the prime order group. Although recommended in the paper and Curve25519's common misuse as a prime order groups, it is not a prime order group. With just 4 extra functions, you have Ristretto255, which is a prime order group.

Questions

  1. Is a hash-then-prf or hash-then-prp sufficient to build a sphinx-suitable MAC with a fast UHF?
david415 commented 5 years ago

@burdges Would you be able to add some opinions to this ticket? We are considering that we may want to upgrade our sphinx to use a slightly different selection of cryptographic primitives... and I have no idea what the answers are to the questions raised by @james-darkfox

burdges commented 5 years ago

Is the hash function speed really an issue here? I personally love the XoFs like Blake2x and Shake128, but Blake2x has few implementations, and Shake128 might be slower than you like.

Is there a problem with just using poly1305 for the 16 byte MAC?

AES128-CTR lacks hardware support on small ARM devices, so maybe you'd want ChaCha if you're worried about battery on mobile? If otoh you're talking laptops then you should stick with AES I guess, although we're only talking 1kb or so, so not too big a deal.

It'd be cute exercise to work out doing all the above with STROBE, although doing so sounds unlikely to improve performance or simplify anything.

I'm actually not familiar with HCTR but I suppose the relevant papers are http://delta.cs.cinvestav.mx/~debrup/hctr.pdf and https://www.iacr.org/workshops/fse2008/docs/papers/day_2_sess_3/18_hctr_debrup_fse.pdf so I'll take a look.

DJB mention Google has some new SPRP that strongly resembles his unpublished HHFHFH, expect it used AES internally in one place he'd rather avoid. Is that one HCTR perhaps?


There is nothing wrong with using X25519 alone, or a constant-time variable-base Ed25519 scalar multiplication, so long as you multiply by the cofactor correctly, like by using the bit clamping, but..

I'm afraid curve25519's bit clamping breaks hybrid post-quantum schemes in which curve25519 is hybridized with say CSIDH. I think X25519 ignoring the sign of y does not create further problems because only one sign choice pops out after the cofactor multiplication (like 2-torsion in the complexes).

I'd therefore suggest doing explicit cofactor multiplications instead of big clamping, if using either curve form, not because you need it right now, but because it'll simplify some transition one day.

There are plenty of projects like Tor that occasionally use curve25519 without the clamping, but maybe one should just adopt Ristretto when one gives up on strictly adhering the curve25519 specs, if only to save yourself spec length by directly citing: https://mailarchive.ietf.org/arch/browse/cfrg/?gbt=1&q=ristretto

I always like Ristretto anyways of course, so actually yes I'd use Ristretto here by default. In principle, there is a Ristretto compression directly for Montgomery curves, so maybe that's faster than using the Edwards Ristretto variant implemented in curve25519-dalek?

WildCryptoFox commented 5 years ago

@burdges

Is there a problem with just using poly1305 for the 16 byte MAC?

Yes, the security proof for termination asks for a MAC that acts like a random oracle when all secrets are known to the adversary (to be conservative). Poly1305 is an ADU and is slower than POLYVAL on modern systems which accelerate GF(2^k) fields. I'm not sure if a UHF is sufficient or if UHF-then-prf or UHF-then-prp meet this requirement when the adversary knows the keys. I expect either UHF-then-prf or prp would be sufficient, if UHFs aren't, but I'd have to ask Sphinx' authors to confirm. Until then, my recommendation remains as blake2b for general sphinx users.

That particular security proof can be made redundant by Katzenpost's stratified topology which forces the packet to flow towards termination. This argument cannot be reused for any general topology. Fortunately the sphinx implementation can be generic and Katzenpost only needs to fulfill the necessary traits.


I have my own independent sphinx implementation in the works - not yet public - which does use https://docs.rs/merlin thus STROBE under the hood. I might look into an alternative design to merlin for transcript commitments and challenges, likely using blake2x as one backend.


I have an onion-optimized implementation of HCTR, generic over the PRP and AXU. I'll probably generalize it to expose the PRG to selection for potential users who would benefit from using chacha20 as the PRG but still need AES as the single PRP call. Onion-AE is relevant for formal analysis for layering SPRPs, which may be of interest.

Yes those are the main relevant HCTR papers to consider. No, it isn't Google's SPRP, I believe you're thinking of Adiantum. Adiantum is new, it relies on more primitives and is thus more complicated. Its construction HBSH consumes an ADU, a PRP and a PRG. Adiantum uses AES as the PRP, reduced round xchacha12 as the PRG, and NH-then-Poly1305 as its AXU. HPolyC uses Poly1305 directly.

HBSH is more or less HCTR with CTR replaced with a separate PRG, AXU replaced with ADU, and it derives the stream cipher's nonce from after the PRP instead of the xor of before-and-after. The construction looks good, but it is very new. Adiantum uses reduced round chacha, which is cheaty and undesired. The simpler alternative without relying on new proofs would be to use an independent PRG in place of CTR in HCTR, such as full-round chacha20.

Any modern mobile device supports efficient AES and GF(2^k). Android uses Adiantum and HPolyC for the lower end devices as a fallback to AES-CBC which isn't even an SPRP; for full disk encryption.

The set of all users for a mixnet is ideally large. The set of users using old devices decreases with time. We want a consistent ciphersuite among all users to avoid partitioning the anonymity set. This may help the few but it is the relays and users with modern devices who suffer being chained down to a less efficient fallback; thus consuming more energy in the complete system.


Working directly with an actual prime order group is a lot cleaner than worrying about curve25519 specific quirks. Ristretto with Montgomery ladders... That'd be nice. PRs are welcome to curve25519-dalek. :-)

burdges commented 5 years ago

Is the proof your talking about the original UC one? I've no attachment to poly1305 here, so yes Blake2b/x or similar random oracle then. Thanks! :)

We should maybe do a general STROBE implementation that avoids allocations entirely. At first blush, merlin gets slightly opinionated for this use case, but I guess merlin plus Clone and without touching build_rng works perfectly. As an aside, merlin docs suggests always using an extension trait, so you wind up being generic anyways, like https://github.com/w3f/schnorrkel/blob/master/src/context.rs

I never quite finished my own Sphinx implementation: https://github.com/burdges/lake/tree/master/Xolotl/src/sphinx It has some nice features, like no allocations on the relay, a transactional packet builder, and the sphinx tweak for storage inside the mixnet, and some stuff I no longer consider worthwhile. We could try to merge efforts if you like, although not sure if I'll be that useful right now.

I'll look over the HCTR papers eventually, but HCTR sounds just fine. :)

david415 commented 5 years ago

my current plan is to learn from @james-darkfox and @burdges to improve not only the katzenpost sphinx implementation but also our binary compatible rust sphinx which is complete, working and shared test vectors to prove binary compatibility; located here: https://github.com/sphinx-cryptography/rust-sphinxcrypto

also my rust sphinx is fast. probably as fast as our golang sphinx but i cannot prove this because the benchmark i wrote is stupid and does a clone or copy or something dumb like that.

WildCryptoFox commented 5 years ago

@burdges The proof of section 4.2 proves even an adversary as powerful as to know all secrets, cannot create a packet which traverses more than a fixed bound N hops. Sphinx consumers who enforce an directed acyclic topology, such as Katzenpost's stratified design, can ignore this termination proof as it is redundant and thus use faster PRFs. Cyclic-able topologies (free route or restricted meshes) must either pay the constrained selection or derive a different proof to preserve proven termination.

burdges commented 5 years ago

Can any design be non-terminating? Yes, if we identify relays by index, and lack any MAC, then almost every packet is non-terminating.

I suspect termination can be proven for almost any MAC design though, just not in full universal composability.

As an informal example, we consider a design without any explicit MAC that instead identifies replays by their key material, and drops packets without a valid destination. In other words, a sketchy mac-then-encrypt design instead of the safer encrypt-then-mac used by sphinx.

Assume (a) the stream cipher is a random oracle, and (b) the number of relay keys is negligible in the security parameter, i.e. relays times rotation frequency. After one full cycle, the routing information consists entirely of random oracle outputs xored together, so now the next hop cannot be identified even to be adversarially chosen key material because scalar base point multiplication is collision resistant.

As an aside, an issue with this sketchy mac-then-encrypt design is it creates a decryption oracle from timing attacks on the relay key material lookup database. You could however create a constant-time validation scheme for relay key material using some long series of bloom filters, but it makes implementations fragile. Also, the sketchy mac-then-encrypt design saves nothing over an index based lookup with a MAC. Any more fun issues with the sketchy mac-then-encrypt? ;)

burdges commented 5 years ago

We should also discuss MAC tag size, but presumably everyone agrees 16 bytes suffices, even for some future post-quantum variant. I suppose the UC termination proof might technically involve a birthday bound, but sounds excessive.

burdges commented 5 years ago

I just saw strobe-rs v0.5 now has almost the same interface to merlin/src/strobe.rs https://github.com/dalek-cryptography/merlin/pull/48 so maybe best doing a SphinxSTROBE extension trait that uses that directly, but implements merlin's append and challenge methods, and then implements the specific outputs using clone plus those.

WildCryptoFox commented 5 years ago

@burdges So you concur that any PRF should be sufficient so long that only the PRG behave as a random oracle? In such case we may use polyval or poly1305 or any other UHF. Regarding tag size, I concur to 16 bytes.


The biggest threat quantum computers bring is Shor's algorithm which given sufficient reliable qubits can solve DLP, thus break the CDH and DDH assumptions we rely on for sphinx' handshake. Given these secrets, an attacker who records today's transcripts completely breaks confidentiality. To defend against Shor's algorithm, we have post-quantum key exchanges and key encapsulation mechanisms. Post-quantum integrity doesn't matter until we have active quantum adversaries.

While we're talking about powerful adversaries, lets not forget about massively parallel adversaries in the multi-user setting; that is when many users send the same messages. 2016/564 has a great analysis on CTR + AXU constructions, focusing on AES-GCM. Fortunately just deriving the nonce along with the key, random that is, out of the attacker's control, Sphinx' header is unaffected. Additionally in isolation, at least HCTR and AEZv5 are naturally immune to the attacks in the paper.


As for strobe. Do you mean using strobe as all the: KDF (for the blinding factor), PRF and PRG? I agree with the trait interface for transcripts to allow for other backends to make their decisions. I'm not sold on the keccak backend for all; like my suggestion for blake2* this will need benchmarks to decide. It could be faster to derive a single key for AES-CTR to derive the rest, then use POLYVAL or Hash2L as the MAC. It would be certain if the headers were large but they're relatively small - and for this reason alone - should the variable output length KDF be considered as the PRG, and if applicable as the PRF.