mit-plv / fiat-crypto

Cryptographic Primitive Code Generation by Fiat
http://adam.chlipala.net/papers/FiatCryptoSP19/FiatCryptoSP19.pdf
Other
717 stars 147 forks source link

Quadratic Field Extensions in Fiat Crypto #904

Open RasmusHoldsbjergCSAU opened 3 years ago

RasmusHoldsbjergCSAU commented 3 years ago

We have been using the Fiat Crypto framework for developing verified and efficient implementations of quadratic field extensions to be used for curves such as FourQ and BLS-12.

The project can be found here: https://github.com/RasmusHoldsbjergCSAU/QuadraticFieldExtensions

So far an implementation of Montgomery multiplication has been developed, which separates the multiplication and reduction steps of the algorithm, the goal is to use this to implement arithmetic in the field extensions using a lazy reduction scheme, where unnecessary reductions are avoided.

In the long run we would like to use the curves mentioned above. We are currently investigating using something like Rupicola for supporting loops and function calls, similarly to what was done here: https://github.com/mit-plv/rupicola/blob/master/src/Rupicola/Examples/ECC/MontgomeryLadder.v

The implementation of the separated Montgomery multiplication and reduction can be found here: https://github.com/RasmusHoldsbjergCSAU/QuadraticFieldExtensions/blob/master/Implementations/SOS/SOSMul.v and here: https://github.com/RasmusHoldsbjergCSAU/QuadraticFieldExtensions/blob/master/Implementations/SOS/SOSReduction.v

Let me know if this is of interest, if so we will gladly submit a pull request, to have the project integrated into Fiat Crypto.

RasmusHoldsbjergCSAU commented 3 years ago

@JasonGross @andres-erbsen @jadephilipoom @bshvass?

jadephilipoom commented 3 years ago

It would be great to get a PR for this! I think the best way to match our existing code structure would be to put it in the Arithmetic directory.

JasonGross commented 3 years ago

It just occurred to me: is this the Comba+SOS montgomery implementation suggested at https://github.com/mit-plv/fiat-crypto/issues/783#issuecomment-641616475 ? I'd be excited to get this merged into the repo, and have a couple of questions about the integration with our synthesis pipeline:

  1. If you're separating multiplication and reduction, how do bounds work? For unsaturated solinas we have tight bounds and loose bounds, but it's not clear to me how something similar would work for the saturated Montgomery representation
  2. How do we want to present the different algorithms to the user? Should we make whichever algorithm benchmarks as the fastest be the default, and add a --cios or --sos flag to choose the alternate one? (And we can have mul and reduce for both options, but in the --cios world reduce is a no-op? Speaking of which, what's the difference between "reduce" and "carry", in general/terminologically?)
RasmusHoldsbjergCSAU commented 3 years ago

Yes, the SOS method does indeed refer to the same method of that paper; I am not sure what "Comba" refers to here (perhaps @dfaranha can clarify?).

With respect to bounds, the usual tight bounds of saturated Montgomery representations should be used for inputs to multiplication and outputs of reduction. We should then use looser bounds for outputs of multiplication and inputs of reduction.

We did a bit of benchmarking of the synthesized C code, but if i remember correctly some compilers seemed to prefer tthe CIOS method and some the SOS method. That being said, I think that using one as default and having a flag to choose the other is a good idea. Reduce would be a no-op for CIOS, since the multiplication procedure of CIOS does both multiplication and reduction.

Perhaps a naming of the functions to make it clear to the user that CIOS mult does both reduction and multiplication whereas SOS mult only does multiplication is in order.

As for reduction/carry, the reduction procedure of SOS, given input x, computes the remainder of x * R⁻¹, not just of x.

dfaranha commented 3 years ago

"Comba" means the product-scanning approach that saves on memory operations, which we still haven't implemented. In a product-scanning multiplication, each column of the result is solved per iteration of the algorithm by adding all the contributing digit products with carry propagation. Although we split multiplication and reduction steps, we can't see a clear speedup yet, so there is still work left to do.

armfazh commented 3 years ago

See how CIRCL implements extension field arithmetic for BLS12. https://github.com/cloudflare/circl/pull/252

Implementing these extensions natively in Fiat would be a nice feature.