extended Jacobian coordinates

einar-taiko commented 1 year ago

This issue tracks the progress on https://github.com/privacy-scaling-explorations/halo2/issues/187

Original text

[ ] (optional +10% perf) implement extended Jacobian coordinates. Their mixed-addition formula is 10 field mul instead of 11 field mul (see also this march paper from Gnark, Table 3, https://tches.iacr.org/index.php/TCHES/article/download/10972/10279 and BLST codebase). used as a fallback: When the MSM is too small for affine addition When too many points need to be accumulated in the same bucket and our temp buffer overflows.

There are a couple of codebases with the correct formulas (BLST, pasta-msm, Gnark and Constantine), they can be rederived from Jacobian coordinates i.e. you store (X, Y, ZZ, ZZZ) instead of (X, Y, Z) so you save squarings and multiplications.

The code

The code belongs in the halo2curves crate, which we will need to fork (or upstream) and use as dependency for zkevm-circuits when it is ready.

Timeline

The halo2curves used use (non-extended) Jacobian coordinates until this PR where they were replaced with homogeneous coordinates a.k.a homogeneous projective coordinates. This issue is about adding Extended Jacobian Coordinates while keeping the homogeneous coordinates which are needed for FFT.

Publications:

The Extended Jacobian Coordinates are discussed in the paper Faster Montgomery multiplication and Multi-Scalar-Multiplication for SNARKs.

Existing implementations:

einar-taiko commented 1 year ago

This comment references the Gnark code generator, so maybe it is already in use.

However, our Cargo.toml seems to lock us to version 0.3.2 that does not have this comment.

mratsim commented 1 year ago

The comment refers to endomorphism acceleration which allows splitting a scalar k into {k1, k2} and transform scalar multiplication [k]P into a double-scalar-multiplication [k1]P + [k2]λP with k1 and k2 using ~127 bits instead of 254 bits (in our case), so halving the number of total operations. λP is a point that can be quickly computed due to the curve properties, see https://github.com/mratsim/constantine/blob/151f284/sage/derive_endomorphisms.sage#L85-L94

The optimization I propose is a coordinate system change. It is complementary.

mratsim commented 1 year ago

See:

einar-taiko commented 1 year ago

halo2curves uses the subtle crate and Rust's bitwise boolean operators[^3] to implement constant-time operations. All of Gnark, Constantine, BLST and sppark uses variable-time operations at least for addition.

If we want to upstream our changes, we probably need to either find an existing constant-time algorithm or derive one. I suspect it does not matter for a validity roll-up.

I will mirror the variable-time approach for now.

[^3]: since they don't have short-circuit semantics

einar-taiko commented 1 year ago

@mratsim I had a nice conversation with @AlekseiVambol today, where he raised the question whether the ExtendedJacobian coordinates approach is really worth it. I am not qualified to make the case, but the argument is twofold (1) using ExtendedJacobian coordinates implies using variable-time computations giving rise to side-channel attacks as described above and (2) the expected speed-up is very small due to the specific constants used on some/most of the halo curves where a=0 and b=3 or b=7.

mratsim commented 1 year ago

(1) using ExtendedJacobian coordinates implies using variable-time computations giving rise to side-channel attacks as described above

We don't have secrets to protect.

(2) the expected speed-up is very small due to the specific constants used on some/most of the halo curves where a=0 and b=3 or b=7.

~70% of time is spent in MSMs when generating proofs.

The speedup is 10% on a computation that takes few seconds to minutes. This can easily save thousands in AWS bills.

Now, there are other optimizations with bigger impact than 10%, which is why I marked this one optional. But it's 10% vs normal jacobian coordinates, I expect a bigger speedup against constant-time projective coordinates.

AlekseiVambol commented 1 year ago

It seems that some variant of addition for (non-Extended) Jacobian Coordinates is already implemented for all the curves by means of the "new_curve_impl" macro. But it was in the previous version of Halo2: https://github.com/privacy-scaling-explorations/halo2curves/commit/1a01928ac4377448c114c7dd1cf5a0df6b81283a , see line 952 Currently this implementation is replaced with "Complete, projective point addition for arbitrary prime order short Weierstrass curves": See line 895: https://github.com/privacy-scaling-explorations/halo2curves/blob/main/src/derive/curve.rs The complexity of new algorithm is 12M + 3ma + 2m3b + 23a, according to the paper and it does not have any special cases like doubling or adding the infinity point; (ma - multiply by a, m3b - multiply by 3b); The complexity of the old algorithm is 12M + 2S + 6add + 1*2 (according to Einar's link) and it has special case, which must be treated by the doubling algorithm. Also, since a is 0 and b is very small both for bn254 (b = 2) and secp256k1 (b = 7), we have no "3ma" and can avoid "2m3b" (using some additions), so I think that that the old algorithm does not have large performance advantages over the new one (or even the new algorithm is faster). So, I suppose, that the developers of ECC for Halo2 see no reason for using the Extended Jacobian Coordinates, which results in replacing "Jacobian Coordinates Algorithm" with "Complete Projective Point Additiion Algorithm" instead of "Extended Jacobian Coordinates Algorithm". Anyway, it would be good to ask these ECC developers about their decision before further movements.

einar-taiko commented 12 months ago

We don't have secrets to protect.

In that case there may be more optimizations opportunities since the halo2curves uses constant-time functions for everything. That is wherever we use the homogeneous coordinates we could potentially reduce computation time somewhat even if it is not the bottleneck.

mratsim commented 12 months ago

Extended Jacobian would only be used in Multi-Scalar-Multiplication. And in MSMs, all points to accumulate come in affine coordinates hence the only operation that matters is mixed addition.

Data is in this paper https://tches.iacr.org/index.php/TCHES/article/download/10972/10279, table 2 and 3, which is also mentioned in the original optimization outline https://github.com/privacy-scaling-explorations/halo2curves/issues/163

I have all 3 implemented (note that in my lib, projective AND jacobian are constant-time, projective uses the same formulas as Halo2):

Reproduction (with Nim programming language installed):

git clone https://github.com/mratsim/constantine
cd constantine
CC=clang nimble bench_ec_g1

There is a ~20% perf difference on mixed addition

AlekseiVambol commented 12 months ago

I agree. If all the input points are in the affine coordinates and in the Pippenger method the size of a chunk is approximately ln(N) + 2 (as it is said to be in the Arkworks implementation), then for large number of points (as in the case of Taiko's supercircuit) the mixed addition will happen more often then usual Extended Jacobian addition. Thus, the Extended Jacobian Coordinates are worthy of being introduced for MSM, but only for it.

einar-taiko commented 11 months ago

Existing MSM code in https://github.com/taikoxyz/halo2/blob/main/halo2_proofs/src/poly/ipa/msm.rs

mratsim commented 11 months ago

Existing MSM code in https://github.com/taikoxyz/halo2/blob/main/halo2_proofs/src/poly/ipa/msm.rs

I'm not sure what the quoted code is used for but the MSM code is here: https://github.com/taikoxyz/halo2/blob/main/halo2_proofs/src/arithmetic.rs#L41-L198

einar-taiko commented 11 months ago

To measure the performance impact, please follow these instructions:

cd /tmp
git clone git@github.com:einar-taiko/halo2curves.git
cd halo2curves
git checkout b09144e
cargo bench multiexp
git checkout einar/pr/extended.jacobian.coordinates
cargo bench multiexp

taikoxyz / zkevm-circuits