Fast verification of multiple BLS signatures

dapplion commented 3 years ago

In the context of Eth2.0 Vitalik Buterin proposes a technique for optimizing the verification of all n signatures at the same time, especially in the case where the same message is signed by many public keys.

https://ethresear.ch/t/fast-verification-of-multiple-bls-signatures/5407

I am implementing support for this technique for Lodestar, but since bls does not expose mcl functions we should wrap and load both wasms which is not optimal. Since this technique is really useful I believe it is appropriate to include it in this library directly. It is included in blst as verify_multiple_aggregate_signatures

I would love to contribute to the source library but my C++ knowledge is too limited. I would like to explore the interest in implementing it by the maintainer. I attach below a reference implementation in javascript consuming the WASM compile of bls and mcl, to illustrate the scope of this function.

function verifyMultipleAggregateSignatures(sigVec: Signature[], pubVec: PublicKey[], msgVec: Uint8Array[]) {
  const msgs: Uint8Array = concatUint8Array(msgVec);

  // Get and cast random values to Fr
  const randXs = Array.from({ length: sigVec.length }, (_, i) => {
    const x = new mcl.Fr();
    x.setByCSPRNG();
    return x;
  });

  // Multi scalar multiplication on: Sig * rand, as G2 points
  const sigVecG2s: G2[] = sigVec.map(castSigToG2);
  const aggSig: Signature = castG2ToSig(mcl.mulVec(sigVecG2s, randXs));

  // Scalar multiplication on: Pub * rand, as G1 points
  const multiKeys: PublicKey[] = pubVec.map((pub, i) =>
    castG1ToPub(mcl.mul(castPubToG1(pub), randXs[i]))
  );

  return aggSig.aggregateVerifyNoCheck(multiKeys, msgsConcat);
}

Thank you so much for your valuable contributions :heart:

herumi commented 3 years ago

Thank you for the proposal. How bit is that n supposed to be?

dapplion commented 3 years ago

Thank you for the proposal. How bit is that n supposed to be?

Thank you! n as in the number of signatures depends on the use-case. The most common will be verifying an entire Eth2 block at once, which can contain in the hundreds of signatures

herumi commented 3 years ago

Okay, I'll consider what API is good.

mratsim commented 3 years ago

The BLST API is not bad and allows easy parallelization:

https://github.com/supranational/blst/blob/cd0847a7/src/aggregate.c#L7-L37

High-level usage Rust: https://github.com/supranational/blst/blob/cd0847a7/bindings/rust/src/lib.rs#L589-L680 High-level usage Go: https://github.com/supranational/blst/blob/cd0847a7/bindings/go/blst.go#L387-L486

dapplion commented 3 years ago

The BLST API is not bad and allows easy parallelization:

blst API looks great +1

herumi commented 3 years ago

I have a question.

Input : {pk(i,j), msg(i,j), sig_i} for i = 1, ..., n, j = 1, ..., ni, where pk ; public key, sig ; aggregated signature for {msg(i,j), j=1, .., n_i}. Are all n_i the same value?

herumi commented 3 years ago

Verification checks the following equation

r_1, ..., rn ; random number (Is the size 2^64 okay?) sig := sum(i=1,...,n) r_i sigi prod(i=1,...,n) prod_(j=1, ..,n_i) e(ri pk(i,j), HashToG2(m_(i,j))) = e(P, sig)

prod_j e(ri pk(i, j) Hash(m_(i,j))) = (prodj e(pk(i, j) Hash(m_(i,j))))^r_i

The scalar multiplication of G1 is about 3.2 times faster than GT::pow (C++), so if n_i >= 4 then I think that the latter is faster. Is it correct?

herumi commented 3 years ago

I might be confused. Do you need the multi version of blsFastAggregateVerify?

herumi commented 3 years ago

Suppose the condition again:

P is a generator of G1. Input : {pk(i,j), msg(i,j), sig_i} for i = 1, ..., n, j = 1, ..., ni, where pk(i,j) are public keys, sigi is i-th aggregated signature for {msg(i,j), j=1, .., n_i}.

We want to check the following equations:

ni Π e(pk(i,j), H(msg_(i,j))) = e(P, sig_i) j=1 for i = 1, ..., n

e(P, Q) = FE(ML(P, Q)) ; FE = finalExp, ML = Miller Loop

FE( Πj ML(pk(i,j), H(msg_(i,j))) ML(-P, sig_i) ) = 1 for i = 1, ..., n

We have to check the one equation: Chose r1, .., rn randomly,

FE(Π_i (Πj ML(pk(i,j), H(msg_(i,j))) ML(-P, sig_i) )^r_i) = 1

The former cost is n FE, the latter is 1 FE + n GT-pow.

FE 1.3Mclk GT-pow 0.8Mclk So the modification will get about 1.3/0.8 = 1.6 times faster.

It is correct?

herumi commented 3 years ago

I was wrong.

FE(Π_i (Πj ML(pk(i,j), H(msg_(i,j))) ML(-P, sig_i) )^r_i) = 1

The former cost is n FE, the latter is 1 FE + n GT-pow.

The latter requires more n ML.

dapplion commented 3 years ago

I have a question.

Input : {pk(i,j), msg(i,j), sig_i} for i = 1, ..., n, j = 1, ..., ni, where pk ; public key, sig ; aggregated signature for {msg(i,j), j=1, .., n_i}. Are all n_i the same value?

n_i are not necessarily the same number. In practice n_i will probably be 1 since for an eth2 block verification all signatures of equal messages are already aggregated.

r_1, ..., r_n ; random number (Is the size 2^64 okay?)

2^64 is okay Vitalik argues it provides enough security.

I would love @wemeetagain opinion on your performance questions

herumi commented 3 years ago

@dapplion I'm sorry that I want to know again what you want.

n generic aggregate verification: Input: {pk_ij, m_ij, sig_i | i = 1,...,n, j = 1,...,n_i} pk_ij ; public key m_ij ; message sig_i ; signature Output: ok_i := verifyAggregateSignature({pk_ij, m_ij | j=1,...n_i}, sig_i) for i = 1,...,n return true if all ok_i is true

Do you have some assumption such as m_i1 = m_i2 = ... = m_i(n_i) = m_i or pk_i1 = pk_i2 = ...?

herumi commented 3 years ago

@dapplion Okay, I understood. multiVerify takes {pk_i, m_i, sig_i|i = 1, ..., n} and returns true if all SingleVerify(pk_i, m_i, sig_i) are true.

My question about the performance is for n_i > 4, so it does not apply for this situation.

mratsim commented 3 years ago

In terms of sizing cc @justindrake, Eth2.0 requires 2 verification procedure

https://github.com/ethereum/eth2.0-specs/blob/v0.12.3/specs/phase0/beacon-chain.md#bls-signatures

def FastAggregateVerify(PKs: Sequence[BLSPubkey], message: Bytes, signature: BLSSignature) -> bool
def AggregateVerify(PKs: Sequence[BLSPubkey], messages: Sequence[Bytes], signature: BLSSignature) -> bool

Fast aggregate verify is verifying that multiple public keys properly sign a single message. The Eth2 protocol will ensure that all hold a proof-of-possession to defend against rogue key attacks.

You do not need to provide the high-level function as-is, just a "pairing context" in which we can accumulate signatures. For multithreading those pairing context need to have a "merge" procedure.

Example usage

Using Miracl/Milagro this is how I use the pairing context: https://github.com/status-im/nim-blscurve/blob/14708996f639141e74f7dc2a87a0e2c8d57ed1b0/blscurve/miracl/bls_signature_scheme.nim#L226-L295

type
  ContextCoreAggregateVerify = object
    C1: array[AteBitsCount, FP12_BLS12381]

func init(ctx: var ContextCoreAggregateVerify) =
  ## initialize an aggregate verification context
  PAIR_BLS12381_initmp(addr ctx.C1[0])                                # C1 = 1 (identity element)

template `&`(point: GroupG1 or GroupG2): untyped = unsafeAddr point

func update[T: char|byte](
       ctx: var ContextCoreAggregateVerify,
       publicKey: PublicKey,
       message: openarray[T],
       domainSepTag: static string): bool =
  if not subgroupCheck(publicKey.point):
    return false
  let Q = hashToG2(message, domainSepTag)                   # Q = hash_to_point(message_i)
  PAIR_BLS12381_another(addr ctx.C1[0], &Q, &publicKey.point) # C1 = C1 * pairing(Q, xP)
  return true

func finish(ctx: var ContextCoreAggregateVerify, signature: Signature): bool =
  # Accumulate the multiplicative inverse of C2 into C1
  let nP1 = neg(generator1())
  PAIR_BLS12381_another(addr ctx.C1[0], &signature.point, &nP1)
  # Optimal Ate Pairing
  var v: FP12_BLS12381
  PAIR_BLS12381_miller(addr v, addr ctx.C1[0])
  PAIR_BLS12381_fexp(addr v)

  if FP12_BLS12381_isunity(addr v) == 1:
    return true
  return false

# ....

func aggregateVerify*(
        publicKeys: openarray[PublicKey],
        messages: openarray[string or seq[byte]],
        signature: Signature): bool =
  # OpenArray is pointer+length pair
  if publicKeys.len != messages.len:
    return false
  if not(publicKeys.len >= 1):
    return false

  var ctx: ContextCoreAggregateVerify
  ctx.init()
  for i in 0 ..< publicKeys.len:
    if not ctx.update(publicKeys[i], messages[i], DST):
      return false
  return ctx.finish(signature)

func fastAggregateVerify*[T: byte|char](
        publicKeys: openarray[PublicKey],
        message: openarray[T],
        signature: Signature
      ): bool =
  if publicKeys.len == 0:
    return false
  var aggregate = publicKeys[0]
  for i in 1 ..< publicKeys.len:
    aggregate.point.add(publicKeys[i].point)
  return coreVerify(aggregate, message, signature, DST)

And with BLST https://github.com/status-im/nim-blscurve/blob/14708996f639141e74f7dc2a87a0e2c8d57ed1b0/blscurve/blst/bls_sig_min_pubkey_size_pop.nim#L341-L428

type
  ContextCoreAggregateVerify = object
    c: blst_pairing

func init(ctx: var ContextCoreAggregateVerify, domainSepTag: static string) {.inline.} =
  ## initialize an aggregate verification context
  ctx.c.blst_pairing_init(
    hash_or_encode = kHash,
    domainSepTag
  )                           # C1 = 1 (identity element)

func update[T: char|byte](
       ctx: var ContextCoreAggregateVerify,
       publicKey: PublicKey,
       message: openarray[T]): bool {.inline.} =
  result = BLST_SUCCESS == ctx.c.blst_pairing_aggregate_pk_in_g1(
    PK = publicKey.point.unsafeAddr,
    signature = nil,
    msg = message,
    aug = ""
  )

func finish(ctx: var ContextCoreAggregateVerify, signature: Signature or AggregateSignature): bool =
  when signature is Signature:
    result = BLST_SUCCESS == ctx.c.blst_pairing_aggregate_pk_in_g1(
      PK = nil,
      signature = signature.point.unsafeAddr,
      msg = "",
      aug = ""
    )
  elif signature is AggregateSignature:
    block:
      var sig{.noInit.}: blst_p2_affine
      sig.blst_p2_to_affine(signature.point)
      result = BLST_SUCCESS == ctx.c.blst_pairing_aggregate_pk_in_g1(
        PK = nil,
        signature = sig,
        msg = "",
        aug = ""
      )
  else:
    {.error: "Unreachable".}

  if not result: return

  ctx.c.blst_pairing_commit()
  result = bool ctx.c.blst_pairing_finalverify(nil)

# ....

func aggregateVerify*(
        publicKeys: openarray[PublicKey],
        messages: openarray[string or seq[byte]],
        signature: Signature): bool =
  # openarray is pointer + length pair
  if publicKeys.len != messages.len:
    return false
  if not(publicKeys.len >= 1):
    return false

  var ctx{.noInit.}: ref ContextCoreAggregateVerify
  new ctx

  ctx[].init(DST)
  for i in 0 ..< publicKeys.len:
    result = ctx[].update(publicKeys[i], messages[i])
    if not result:
      return
  return ctx[].finish(signature)

func fastAggregateVerify*[T: byte|char](
        publicKeys: openarray[PublicKey],
        message: openarray[T],
        signature: Signature
      ): bool =
  if publicKeys.len == 0:
    return false
  var aggregate {.noInit.}: blst_p1
  aggregate.blst_p1_from_affine(publicKeys[0].point)
  for i in 1 ..< publicKeys.len:
    # We assume that the PublicKey is in on curve, in the proper subgroup
    aggregate.blst_p1_add_or_double_affine(aggregate, publicKeys[i].point)

  var aggAffine{.noInit.}: PublicKey
  aggAffine.point.blst_p1_to_affine(aggregate)
  return coreVerify(aggAffine, message, signature, DST)

Parameter size

Ethereum 2 might aggregate up to 128 attestation per block

Related paper

Scott's Pairing Implementation revisited, 2019, https://eprint.iacr.org/2019/077.pdf on decoupling Miller Loop and line accumulation (but IIRC MCL already has that)

Multithreading

The result of miller loops+line evaluation can be computed in separate thread, A protocol implementer then only needs a "merge" procedure which internall just does Fp12 multiplication. And then can accumulate the final Fp12 result in the main thread, either naively or combining tree wise/prefix sum wise i.e. thread n accumulates thread 2n and 2n+1 results.

herumi commented 3 years ago

Thank you for your useful information. I'll start to implement a simple version at first and provide a multi-thread version later.

herumi commented 3 years ago

I made bls.verify(pubs, sigs, msgs) to bls-eth-wasm. It seems about two times faster than to verify each element for n = 50.

herumi commented 3 years ago

blsMutiVerify supports multithread. https://github.com/herumi/bls/blob/master/test/bls_test.hpp#L857

n=400 threadNum	clk
1	538M
5	126M
9	89M
16	67M

But when it is called from cgo, it doesn't seem to have that effect. https://github.com/herumi/bls-eth-go-binary/blob/dev/bls/bls_test.go#L687-L701

>go test ./bls -bench "Multi|Naieve"

pkg: github.com/herumi/bls-eth-go-binary/bls
BenchmarkNaieveMultiVerify-4    1000000000               0.483 ns/op
BenchmarkMultiVerify-4          1000000000               0.230 ns/op

I'll investigate the reason later.

herumi commented 3 years ago

By top command in benchmarking by Go, it shows 100% CPU, so is multi-thread disabled in cgo?

herumi commented 3 years ago

Instead of native pthread, I've used channel. https://github.com/herumi/bls-eth-go-binary/releases/tag/v1.14 Multithreding is effective.

% go test ./bls -bench "Multi|Naieve"
pkg: github.com/herumi/bls-eth-go-binary/bls
BenchmarkNaieveMultiVerify-4           3         495195368 ns/op
BenchmarkMultiVerify-4                16          72099969 ns/op

herumi / bls