Open RasmusHoldsbjergCSAU opened 3 years ago
@JasonGross @andres-erbsen @jadephilipoom @bshvass?
It would be great to get a PR for this! I think the best way to match our existing code structure would be to put it in the Arithmetic directory.
It just occurred to me: is this the Comba+SOS montgomery implementation suggested at https://github.com/mit-plv/fiat-crypto/issues/783#issuecomment-641616475 ? I'd be excited to get this merged into the repo, and have a couple of questions about the integration with our synthesis pipeline:
--cios
or --sos
flag to choose the alternate one? (And we can have mul
and reduce
for both options, but in the --cios
world reduce
is a no-op? Speaking of which, what's the difference between "reduce" and "carry", in general/terminologically?)Yes, the SOS method does indeed refer to the same method of that paper; I am not sure what "Comba" refers to here (perhaps @dfaranha can clarify?).
With respect to bounds, the usual tight bounds of saturated Montgomery representations should be used for inputs to multiplication and outputs of reduction. We should then use looser bounds for outputs of multiplication and inputs of reduction.
We did a bit of benchmarking of the synthesized C code, but if i remember correctly some compilers seemed to prefer tthe CIOS method and some the SOS method. That being said, I think that using one as default and having a flag to choose the other is a good idea. Reduce would be a no-op for CIOS, since the multiplication procedure of CIOS does both multiplication and reduction.
Perhaps a naming of the functions to make it clear to the user that CIOS mult does both reduction and multiplication whereas SOS mult only does multiplication is in order.
As for reduction/carry, the reduction procedure of SOS, given input x, computes the remainder of x * R⁻¹, not just of x.
"Comba" means the product-scanning approach that saves on memory operations, which we still haven't implemented. In a product-scanning multiplication, each column of the result is solved per iteration of the algorithm by adding all the contributing digit products with carry propagation. Although we split multiplication and reduction steps, we can't see a clear speedup yet, so there is still work left to do.
See how CIRCL implements extension field arithmetic for BLS12. https://github.com/cloudflare/circl/pull/252
Implementing these extensions natively in Fiat would be a nice feature.
We have been using the Fiat Crypto framework for developing verified and efficient implementations of quadratic field extensions to be used for curves such as FourQ and BLS-12.
The project can be found here: https://github.com/RasmusHoldsbjergCSAU/QuadraticFieldExtensions
So far an implementation of Montgomery multiplication has been developed, which separates the multiplication and reduction steps of the algorithm, the goal is to use this to implement arithmetic in the field extensions using a lazy reduction scheme, where unnecessary reductions are avoided.
In the long run we would like to use the curves mentioned above. We are currently investigating using something like Rupicola for supporting loops and function calls, similarly to what was done here: https://github.com/mit-plv/rupicola/blob/master/src/Rupicola/Examples/ECC/MontgomeryLadder.v
The implementation of the separated Montgomery multiplication and reduction can be found here: https://github.com/RasmusHoldsbjergCSAU/QuadraticFieldExtensions/blob/master/Implementations/SOS/SOSMul.v and here: https://github.com/RasmusHoldsbjergCSAU/QuadraticFieldExtensions/blob/master/Implementations/SOS/SOSReduction.v
Let me know if this is of interest, if so we will gladly submit a pull request, to have the project integrated into Fiat Crypto.