double-and-add algorithm

ingonyama-zk / icicle

A hardware acceleration library for compute intensive cryptography :ice_cube:

https://dev.ingonyama.com/icicle/overview

MIT License

322 stars 96 forks source link

double-and-add algorithm #11

Open TalDerei opened 1 year ago

TalDerei commented 1 year ago

The addition operation in projective coordinates implements operator overloading that computes a scalar multiplication using a naive double-and-add algorithm. Would be worth specializing the operation by switching it with Montgomery multiplication (CIOS) for more efficient multiplications.

Here's an efficient implementation: https://github.com/MinaProtocol/gpu-groth16-prover-3x/blob/master/multiexp/arith.cu#L289. It uses cooperative groups and thread ranks in cuda. For your codebase, it would simply require changing out the 'fixnum' calls to equivalent PTX instructions you implement in 'ptx.cuh'.

omershlo commented 1 year ago

noted! thanks Tal. We will improve the double-and-add. Is the Mina code implements the state of the art? btw, feel free to make a PR

cc: @DmytroTym, @mickeyasa

TalDerei commented 1 year ago

Sure! It's a stripped down version from cuda-fixnum (extended modular arithmetic library) built by Hamish from Polygon Zero. It's an older repository, but it's performant: https://github.com/data61/cuda-fixnum.

DmytroTym commented 1 year ago

I feel like two different things are at play here: modular multiplication, which is what Montgomery multiplication does, and single-scalar multiplication, which is what double-and-add does. Admittedly, we have room to improve in both. For modular multiplication, we want to see if our scheme based on Barrett reduction can be competitive with Montgomery CIOS, and if not, we'll switch to CIOS. We only have basic Barrett implemented for now. As for double-and-add, you are also right that it's suboptimal, we're planning on moving to more efficient algorithms in the near future.

DmytroTym commented 8 months ago

To comment on this: we haven't implemented a more-efficient single-scalar multiplication than double-and-add yet, but we don't really have any use-cases for it so it's still low priority.