mratsim / constantine

Constantine: modular, high-performance, zero-dependency cryptography stack for proof systems and blockchain protocols.
Other
272 stars 38 forks source link

Another G2 MSM wrong result #366

Closed bkomuves closed 2 days ago

bkomuves commented 4 months ago

Wrong result for a particular G2 MSM calculation (this time multiScalarMul_vartime is affected).

Unfortunately I don't have a small example (so far the smallest one is ~24,000 elements) and I don't have the energy or patience to try find a minimal one, so I just created a repo with all the data and code to reproduce.

Two independent implementations gives the correct result, and also constantine gives the correct result for smaller prefix vectors (I haven't done exhaustive testing here, but for example the first 10,000 elements is fine)

mratsim commented 4 months ago

Thanks for the repro.

It seems like it's once again the endomorphism acceleration that is problematic.

Using multiScalarMul_reference_vartime from https://github.com/mratsim/constantine/blob/c7979b003372b329dd450ff152bb5945cafb0db5/constantine/math/elliptic/ec_multi_scalar_mul.nim#L95-L97 works

And removing withEndo from https://github.com/mratsim/constantine/blob/c7979b003372b329dd450ff152bb5945cafb0db5/constantine/math/elliptic/ec_multi_scalar_mul.nim#L406-L437 works as well.

mratsim commented 4 months ago

One thing really strange is that I tried to slice the input from 0 to 22000 or 1000 to 23000 and having too few points make the bug disappear.

bkomuves commented 4 months ago

So how this bug was discovered was really strange. These are the B2 points from a Groth16 proof. The circuit is essentially doing Poseidon2 hashes. The proof always succeeded below a given circuit size and always failed above a given size, no matter what the inputs are, what the exact structure of these hashes are (linear sponge, Merkle tree, independent permutation calls), this threshold even survived changing the number of the rounds in the Poseidon permutation. However with completely different circuits, much larger circuits worked fine too...

bkomuves commented 4 months ago

multiScalarMul_reference_vartime seems to be a good workaround, but it's more than 3x slower, becoming the primary bottleneck (this single MSM becoming more than half of the total runtime of a Groth16 proof).

mratsim commented 3 months ago

Further investigation shows that when 22528 points work while 22529 points doesn't.

This is the exact value for which c, the bucket size, changes from 12 to 13.

mratsim commented 2 days ago

I found the bug:

https://github.com/mratsim/constantine/blob/5308ddc6abb9e53d31eb91c90a97a09ddcdfd7e8/constantine/math/elliptic/ec_multi_scalar_mul.nim#L255-L284

It is the w -= c outlined there, which should be a discard following the comment just above

image

This bug is hard to trigger but can be triggered reliably once understood, here is the sequence of events.

bkomuves commented 2 days ago

huh, nice detective work!

mratsim commented 2 days ago

This has been merged.

Note that I did several refactorings of the internals in preparation for the first release of Constantine and support for named fields and not just named curves. Here are the important changes:

Full details in #399, #400, #402