mit-plv / fiat-crypto

Cryptographic Primitive Code Generation by Fiat
http://adam.chlipala.net/papers/FiatCryptoSP19/FiatCryptoSP19.pdf
Other
712 stars 147 forks source link

What's the right way to get a carry-bit out of Pow2Base-carrying? #50

Closed JasonGross closed 8 years ago

JasonGross commented 8 years ago

I'm trying to get an add-with-carry on digits that gives me both the resulting sum, modulo upper_bound limb_widths, and a boolean indicating whether or not there's a carry in the most significant bit. I'm looking for something that will be most reusable / best-suited to the underlying assembly. Options I see are:

  1. Build it on top of what's already existing: first add, then return, separately, both the result of carrying, and the boolean given by testing whether the decoded value (pre-carrying) is too big / matches with the result of decoding the carried value
  2. Generalize carry_gen to also return the value it carries, make variants of carry_simple and carry_simple_sequence that do the same
  3. Inject the digits into a larger base, compute the addition in the larger base, and then check the high bit, and return that, together with the truncated digits

This is to instantiate ZLikeOps from #49.

jadephilipoom commented 8 years ago

I don't see an obvious nice way to do this; carry doesn't handle overflowing. Of the options you presented, I think (1) sounds most reasonable.

andres-erbsen commented 8 years ago

The current carry was written to be used in situations where the widths of the limbs are significantly below the widths of the underlying machine integers, for example 51-bit limbs in 64-bit registers. A couple of elementwise additions can be performed safely without carrying, and the engineer implementing something that uses addition would insert carries sparingly, just enough to make sure that accumulating bounds don't lead to overflow.

What you are asking for is something different. When using the full width of the machine words as the limb width, every single-element operation needs to be able to deal with carries internally. This lets you use fewer words and fewer arithmetic operations, but requires more carries and imposes significant data dependency constraints on the evaluation order of the algorithm being implemented. We do not have a formalization of this strategy right now, but we will need it. In particular, I think

  1. barrett reduction modulo p256 using 256-bit words needs this for addition
  2. barrett reduction modulo order of Curve25519 basepoint needs this for addition and multiplication
  3. 128-bit word addition (used pseudo-Mersenne multiplication modulo 2^255-19) needs this for addition

Cases 1 and 3 probably use the same code, 2 might or might not. Now what we have to do is to design this.

andres-erbsen commented 8 years ago

I think it would be best to use a bounded type for the limb width = word width case because there really isn't a place were reasoning about bounds is to be postponed. https://coq.inria.fr/library/Coq.Numbers.Cyclic.Abstract.CyclicAxioms.html#ZnZ.zero seems to have some carry logic (e.g. spec_add_carry_c). In our library, possible candidate types are word 64 and F (2^64). The rep invariant could be stated by mapping the elements of the tuple to Z and then asserting the BaseSystem rep invariant, but the correctness of all operations would need to reason about boundedness.

JasonGross commented 8 years ago

4 Montgomery reduction (or almost montgomery multiplication) with 256-bit words (needed for conditional subtract at the end)

andres-erbsen commented 8 years ago

As discussed yesterday, we want a module for creating wider fixed-width integer operations from given fixed-width operations. Then these wider operations would be used to instantiate bignum-based constructions like ZLikeOps, and the low-level operations would be mapped to machine instructions (followed by usual low-level optimizations like changing add-with-carry to add whenever possible and instructions scheduling, possibly spilling, register allocation).

Assuming an underlying n-bit architecture, I envision the following constructions.

JasonGross commented 8 years ago

probably inline the construction of a 2n x 2n -> 2n bit add-with-carry

I think we should actually parameterize over this: One wrinkle here is that it seems like x86_64 has a 64x64 -> 2 * 64 multiplier and a 64+64 -> 64 adder, while the machine I'm targeting has a 128x128 -> 1*256 multiplier and a 256+256 -> 256 adder. It would be nice to support both of these (having a same-width adder/multiplier vs having a half-width multiplier) in whatever lemmas / definitions we write. (e.g., parameterize the multiplier over a function that splits the bits into the size of the underlying multiplier, ask for both a full-width adder and a half-width adder and a lemma that says that the half-width adder composed with the splitting function, ripple-carried across the array, is the same as the full-width adder ripple-carried across the original array; or else have automation that takes care of both lemmas)

andres-erbsen commented 8 years ago

You are right that this is not actually quite the same pattern. I don't have insight about which strategy for sharing code would be the best, my first guess would be to try writing a specification for the piece of code that handles a single partial product and make two implementations of that, share code for the fixpoint and induction proof that put together the multiplication. But I expect this will become much more clear once you write both definitions; feel free to poke me to look at things any time.

JasonGross commented 8 years ago

I'm going to close this; we've taken a different path to utilizing add-with-carry instructions.