This PR removes unnecessary copies in the following hot paths:
Fp::sum_of_products: remove the copies of a and b by using iter_mut() and iter(), also use out in-place
Miller loop: both doubling_step and addition_step were rewritten to use the in-place operations and minimize the number of copies
Some other minor miller loop copies were removed by using references in the MillerLoopDriver trait
This is achieved by adding zkvm-specific variants of base operations that modify a value in-place via &mut self to prevent unnecessary copies. Then functions in the hot-path are modified (adding a zkvm-specific variant) to use these operations instead where possible. We make a new function and use #[cfg] to select between the zkvm no-copy version and the regular version, to make it easier to compare implementations and ensure the functions are equivalent.
This brings the aptos-lc ratcheting test down from 22289711 cycles to 15933247 cycles (~29% reduction)
The other hot-paths that will be optimized in future follow-up PRs are:
Removing copies in the final_exponentiation function and the functions it calls. This is currently taking around ~6M cycles
hash_to_curve, specifically the G2::mul_by_x() operation could use G2 affine/double precompiles, which should save ~1M cycles
This PR removes unnecessary copies in the following hot paths:
Fp::sum_of_products
: remove the copies ofa
andb
by usingiter_mut()
anditer()
, also useout
in-placedoubling_step
andaddition_step
were rewritten to use the in-place operations and minimize the number of copiesMillerLoopDriver
traitThis is achieved by adding zkvm-specific variants of base operations that modify a value in-place via
&mut self
to prevent unnecessary copies. Then functions in the hot-path are modified (adding a zkvm-specific variant) to use these operations instead where possible. We make a new function and use#[cfg]
to select between the zkvm no-copy version and the regular version, to make it easier to compare implementations and ensure the functions are equivalent.This brings the aptos-lc ratcheting test down from 22289711 cycles to 15933247 cycles (~29% reduction)
The other hot-paths that will be optimized in future follow-up PRs are:
final_exponentiation
function and the functions it calls. This is currently taking around ~6M cycleshash_to_curve
, specifically theG2::mul_by_x()
operation could use G2 affine/double precompiles, which should save ~1M cycles