Open mratsim opened 1 year ago
on-the-fly signed digit recoding with zero allocation
How do you achieve recording on the fly while we need to iterate bucket indexes (ie slices of scalar) in reverse order? @mratsim
on-the-fly signed digit recoding with zero allocation
How do you achieve recording on the fly while we need to iterate bucket indexes (ie slices of scalar) in reverse order?
Using Booth encoding, you only need to see the window and the previous bit. https://github.com/mratsim/constantine/blob/c97036d1df09b25afaddc040aed468a80df0c8d7/constantine/math/arithmetic/bigints.nim#L792-L829
func signedWindowEncoding(digit: SecretWord, bitsize: static int): tuple[val: SecretWord, neg: SecretBool] {.inline.} =
## Get the signed window encoding for `digit`
##
## This uses the fact that 999 = 100 - 1
## It replaces string of binary 1 with 1...-1
## i.e. 0111 becomes 1 0 0 -1
##
## This looks at [bitᵢ₊ₙ..bitᵢ | bitᵢ₋₁]
## and encodes [bitᵢ₊ₙ..bitᵢ]
##
## Notes:
## - This is not a minimum weight encoding unlike NAF
## - Due to constant-time requirement in scalar multiplication
## or bucketing large window in multi-scalar-multiplication
## minimum weight encoding might not lead to saving operations
## - Unlike NAF and wNAF encoding, there is no carry to propagate
## hence this is suitable for parallelization without encoding precomputation
## and for GPUs
## - Implementation uses Booth encoding
result.neg = SecretBool(digit shr bitsize)
let negMask = -SecretWord(result.neg)
const valMask = SecretWord((1 shl bitsize) - 1)
let encode = (digit + One) shr 1 # Lookup bitᵢ₋₁, flip series of 1's
result.val = (encode + negMask) xor negMask # absolute value
result.val = result.val and valMask
func getSignedFullWindowAt*(a: BigInt, bitIndex: int, windowSize: static int): tuple[val: SecretWord, neg: SecretBool] {.inline.} =
## Access a signed window of `a` of size bitsize
## Returns a signed encoding.
##
## The result is `windowSize` bits at a time.
##
## bitIndex != 0 and bitIndex mod windowSize == 0
debug: doAssert (bitIndex != 0) and (bitIndex mod windowSize) == 0
let digit = a.getWindowAt(bitIndex-1, windowSize+1) # get the bit on the right of the window for Booth encoding
return digit.signedWindowEncoding(windowSize)
Usually NAF has the following advantages:
However the main appeal of window NAF is not applicable to MSM because as your window grow, it’s exponentially more likely that at least a bit is set and you will have an addition anyway.
With on-the-fly recoding, you avoid allocating millions of recoded scalar. It's also GPU friendly.
This is a mirror to the plan I laid out in the Discord #collaborate channel.
Goal:
Out-of-scope:
Reference PRs and benchmark of the changes:
The steps
[ ] (mandatory, 70% speedup) Architecture MSM to use affine coordinate (6M asymptotic cost with Montgomery's inversion trick). I saw that @Brechtpd in https://github.com/privacy-scaling-explorations/halo2/pull/40 and @kilic in https://github.com/privacy-scaling-explorations/halo2curves/pull/29 are using Barretenberg approach with a radix sort. I think the better high-level approach is:
The main issues of Barrentenberg approach is lots of precomputation, memory usage scales linearly (2x or 3x) with input size, a complex program flow, hard to port to GPU, a lot of cache misses which will necessarily cause bandwidth problems. See detailed analysis at https://gist.github.com/mratsim/27c78c71fd423f731615a91d237162c3#file-multi-scalar-mul-md
see writeup on an alternative: https://github.com/mratsim/constantine/blob/master/constantine/math/elliptic/ec_multi_scalar_mul_scheduler.nim Main takeaway is memory usage scales with the number of threads, like 4kB per extra thread, instead of number of input points.