Closed sragss closed 9 months ago
Thanks for the PR! @sragss
I think we should add the input range check in the mul
function as you suggested.
Even though it comes with some cost, I believe it is the only way to introduce the CIOS algo to the repo, at the moment.
(Normally, I use the F::from_raw
function for input range check.)
What do you think? @CPerezz @davidnevadoc
Fixed comments @davidnevadoc – Let me know what you think about the operands outside the normal range.
We can add the reduction before, or can update the bn256::test::test_consistent_hash_to_curve
test which uses the hash_to_curve
outside the modulus range. I suspect this test has bad data due to random generation by a script (~25% chance of generating out of the range) rather than intentional usage.
In regards to the outside of range operands, the approach I like is controlling the ways in which we create field elements and then assuming they are in the appropriate ranges in all operations.
For this concrete case, the problem was that the multiplication in from_u512
non-asm version was performing a multiplication on an unchecked field element that could be any 256 bits.
https://github.com/privacy-scaling-explorations/halo2curves/blob/3c43d3c0c3b697c411cfd07f026798998f55073a/src/derive/field.rs#L159
The asm version on the other hand, was using Self::montgomery_reduction
to handle this conversion from bytes to a valid field element.
I think this should be the approach for the non-asm version as well. I propose a solution along this lines: https://github.com/privacy-scaling-explorations/halo2curves/commit/359619aabd967a820ebda9a3aee3db8aeda12139
I have modified the montgomery_reduction
function to have a non-asm version that performs the check @sragss suggested and then multiplies by the appropriate R.
Let me know what you think and feel free to add the change if you like it.
Agree with your approach – added those changes. All tests are passing now and we don't have the slowdown from adding the check to mul
.
Implements CIOS for Montgomery 256-bit field multiplication. Specifically the fast variant (algorithm 2). These changes are particularly relevant on ARM where we do not have x86 / BMI2 / AVX512 and the associated assembly backend. There does not appear to be a NEON equivalent for MULX / ADOX / ADCX.
Accelerates 7-15%, roughly reaching parity with Arkworks on ARM. See sragss/speedy-fields to benchmark.
cargo run --release
is more consistent thancargo bench
due to matching elements.Currently
bn256::test::test_consistent_hash_to_curve
fails due to an attempt to multiply a number larger than both the montgomery radix and the field modulus. This results in a 1-bit difference between implementations. Specifically:I'm not sure if this was intended to be supported. Other libraries (such as Arkworks) do not handle numbers outside of the range of the modulus. Any elements created via
F::new()
undergo field multiplication (byR2
) where they're brought into the proper range.This can be handled by checking if the inputs are greater than the field modulus and subtracting in advance but there's some non-zero cost (2-5%) to performing the check.