Open einar-taiko opened 1 year ago
The comment refers to endomorphism acceleration which allows splitting a scalar k into {k1, k2} and transform scalar multiplication [k]P into a double-scalar-multiplication [k1]P + [k2]λP with k1 and k2 using ~127 bits instead of 254 bits (in our case), so halving the number of total operations. λP is a point that can be quickly computed due to the curve properties, see https://github.com/mratsim/constantine/blob/151f284/sage/derive_endomorphisms.sage#L85-L94
The optimization I propose is a coordinate system change. It is complementary.
See:
halo2curves
uses the subtle
crate and Rust's bitwise boolean operators[^3] to implement constant-time operations.
All of Gnark, Constantine, BLST and sppark uses variable-time operations at least for addition.
If we want to upstream our changes, we probably need to either find an existing constant-time algorithm or derive one. I suspect it does not matter for a validity roll-up.
I will mirror the variable-time approach for now.
[^3]: since they don't have short-circuit semantics
@mratsim I had a nice conversation with @AlekseiVambol today, where he raised the question whether the ExtendedJacobian coordinates approach is really worth it. I am not qualified to make the case, but the argument is twofold (1) using ExtendedJacobian coordinates implies using variable-time computations giving rise to side-channel attacks as described above and (2) the expected speed-up is very small due to the specific constants used on some/most of the halo curves where a=0 and b=3 or b=7.
(1) using ExtendedJacobian coordinates implies using variable-time computations giving rise to side-channel attacks as described above
We don't have secrets to protect.
(2) the expected speed-up is very small due to the specific constants used on some/most of the halo curves where a=0 and b=3 or b=7.
~70% of time is spent in MSMs when generating proofs.
The speedup is 10% on a computation that takes few seconds to minutes. This can easily save thousands in AWS bills.
Now, there are other optimizations with bigger impact than 10%, which is why I marked this one optional. But it's 10% vs normal jacobian coordinates, I expect a bigger speedup against constant-time projective coordinates.
It seems that some variant of addition for (non-Extended) Jacobian Coordinates is already implemented for all the curves by means of the "new_curve_impl" macro. But it was in the previous version of Halo2: https://github.com/privacy-scaling-explorations/halo2curves/commit/1a01928ac4377448c114c7dd1cf5a0df6b81283a , see line 952 Currently this implementation is replaced with "Complete, projective point addition for arbitrary prime order short Weierstrass curves": See line 895: https://github.com/privacy-scaling-explorations/halo2curves/blob/main/src/derive/curve.rs The complexity of new algorithm is 12M + 3ma + 2m3b + 23a, according to the paper and it does not have any special cases like doubling or adding the infinity point; (ma - multiply by a, m3b - multiply by 3b); The complexity of the old algorithm is 12M + 2S + 6add + 1*2 (according to Einar's link) and it has special case, which must be treated by the doubling algorithm. Also, since a is 0 and b is very small both for bn254 (b = 2) and secp256k1 (b = 7), we have no "3ma" and can avoid "2m3b" (using some additions), so I think that that the old algorithm does not have large performance advantages over the new one (or even the new algorithm is faster). So, I suppose, that the developers of ECC for Halo2 see no reason for using the Extended Jacobian Coordinates, which results in replacing "Jacobian Coordinates Algorithm" with "Complete Projective Point Additiion Algorithm" instead of "Extended Jacobian Coordinates Algorithm". Anyway, it would be good to ask these ECC developers about their decision before further movements.
We don't have secrets to protect.
In that case there may be more optimizations opportunities since the halo2curves
uses constant-time functions for everything. That is wherever we use the homogeneous coordinates we could potentially reduce computation time somewhat even if it is not the bottleneck.
Extended Jacobian would only be used in Multi-Scalar-Multiplication. And in MSMs, all points to accumulate come in affine coordinates hence the only operation that matters is mixed addition.
Data is in this paper https://tches.iacr.org/index.php/TCHES/article/download/10972/10279, table 2 and 3, which is also mentioned in the original optimization outline https://github.com/privacy-scaling-explorations/halo2curves/issues/163
I have all 3 implemented (note that in my lib, projective AND jacobian are constant-time, projective uses the same formulas as Halo2):
Reproduction (with Nim programming language installed):
git clone https://github.com/mratsim/constantine
cd constantine
CC=clang nimble bench_ec_g1
There is a ~20% perf difference on mixed addition
I agree. If all the input points are in the affine coordinates and in the Pippenger method the size of a chunk is approximately ln(N) + 2 (as it is said to be in the Arkworks implementation), then for large number of points (as in the case of Taiko's supercircuit) the mixed addition will happen more often then usual Extended Jacobian addition. Thus, the Extended Jacobian Coordinates are worthy of being introduced for MSM, but only for it.
Existing MSM code in https://github.com/taikoxyz/halo2/blob/main/halo2_proofs/src/poly/ipa/msm.rs
Existing MSM code in https://github.com/taikoxyz/halo2/blob/main/halo2_proofs/src/poly/ipa/msm.rs
I'm not sure what the quoted code is used for but the MSM code is here: https://github.com/taikoxyz/halo2/blob/main/halo2_proofs/src/arithmetic.rs#L41-L198
To measure the performance impact, please follow these instructions:
cd /tmp
git clone git@github.com:einar-taiko/halo2curves.git
cd halo2curves
git checkout b09144e
cargo bench multiexp
git checkout einar/pr/extended.jacobian.coordinates
cargo bench multiexp
This issue tracks the progress on https://github.com/privacy-scaling-explorations/halo2/issues/187
Original text
The code
The code belongs in the
halo2curves
crate, which we will need to fork (or upstream) and use as dependency forzkevm-circuits
when it is ready.Timeline
The
halo2curves
used use (non-extended) Jacobian coordinates until this PR where they were replaced with homogeneous coordinates a.k.a homogeneous projective coordinates. This issue is about adding Extended Jacobian Coordinates while keeping the homogeneous coordinates which are needed for FFT.Publications:
The Extended Jacobian Coordinates are discussed in the paper Faster Montgomery multiplication and Multi-Scalar-Multiplication for SNARKs.
Existing implementations: