supranational / blst

Multilingual BLS12-381 signature library
Apache License 2.0
463 stars 178 forks source link

feat: add multiply_by to rust bindings #225

Closed matthewkeil closed 3 months ago

matthewkeil commented 3 months ago

Motivation

We have an optimization that we use to improve signature verification in our Ethereum Consensus implementation and it requires multiplying in some randomness before aggregation for security reasons. We have switched to using the rust bindings but the native blst_p1_affine and blst_p2_affine points are private.

There are two approaches we explored but decided to provide this PR to upstream our work. We are not super excited about having to do the conversion from affine to jacobian and then back again to allow for multiplication but leave that up to you if you feel another route is better.

The other option we explored would be to make the points public on the PublicKey and Signature structs. We are also open to this approach if you prefer going that route but understand that is not ideal either. On the upside though, it would allow for more flexibility for other consumers to implement the various other methods exported by the base lib.

Thank you for your consideration in advance.

Inclusions

dot-asm commented 3 months ago

We have an optimization that we use to improve signature verification in our Ethereum Consensus implementation and it requires multiplying in some randomness before aggregation for security reasons.

verify_multiple_aggregate_signatures is doing that, or are we talking about something else?

We are not super excited about having to do the conversion from affine to jacobian and then back again to allow for multiplication but leave that up to you if you feel another route is better.

Conversion to Jacobian is no-cost, you only need to care about conversion to affine coordinates. Question is what happens to the multiplication results. If you are adding them all together, as you surely are, then it would be more efficient to accumulate Jacobian points and then convert the sum to affine... But have a look at verify_multiple_aggregate_signatures first...

wemeetagain commented 3 months ago

I think the difference with what we're trying to do is an additional level of aggregation as compared to verify_multiple_aggregate_signatures. Since many signatures are signed over the exact same message, we want to aggregate all pks+sigs for each unique message, then verify once, rather than separately verifying each pk+sig individually. But as we understand it, we need to multiply pks and sigs by locally generated randomness, before/when aggregating, to avoid possibly verifying individually invalid signatures. This is the impetus for this PR.

verify_multiple_aggregate_signatures operates on (msg, pk, sig)[], and we're doing an additional level of pk and sig aggregation: (msg, pk[], sig[])[] We're doing so in two steps tho. 1. aggregating the pks and sigs which have signed a single message (pk[], sig[], rand[]) -> (pk, sig) 2. then either individually verifying them or using verify_multiple_aggregate_signatures to batch verify.

Judging from your comment "it would be more efficient to accumulate Jacobian points and then convert the sum to affine", perhaps a better approach here would be to add an fn aggregate_with_randomness(pks: &[&PublicKey], pks_validate: bool, sigs: &[&signature], sigs_groupcheck: bool, rands: &[blst_scalar], rand_bits: usize) -> Result<(PublicKey, Signature), BLST_ERROR>. But happy to defer to your judgement.

matthewkeil commented 3 months ago

As a bit of additional context @dot-asm this is particularly for eth2 attestations. They generally are for the same head so are the same message and there are very few invalid signatures. We found that fast aggregation (sort of) is the way to go.

We are part of the Lodestar team and milking every flop we can is critical for a single threaded client. In order to improve our efficiency in gossip processing we aggregate the pks and sigs for same message attestations and do the the costly verification over a single msg/pk/sig set. We found that on our setup doing this vs {msg, pk, sig}[] when the message is the same is roughly 4 times faster.

Only if we have a bad verification do we re-run as individuals to exclude the invalid set. Even though that is a costly process the cost savings, for the happy case, more than compensates for the infrequent invalid case and we end up processing the traffic much much faster. But as @wemeetagain mentioned we mult in some randomness during the aggregation to prevent attack surface.

I am whipping up a commit with the function @wemeetagain mentioned for your review. After you have a chance to see it we will be happy to update or remove any/all bits that are not to the standard of this incredible repo. 😄 Another option would be to make the point on the pk/sig structs pub and we can implement the rest in our bindings layer. Considering this saves us a ton of thread-time it may also be a nice addition for the Lighthouse team that uses the rust bindings directly.

Thanks again for you consideration in advance and we will be pushing the commit for your review asap.

wemeetagain commented 3 months ago

After doing more digging, it appears that the MultiPoint trait exposes multi-scalar multiplication, which is exactly what we're looking for.

The one hitch to using this is that PublicKey and Signature do not expose their underlying affine points and we would be required to serialize/deserialize to extract the affine points.

Would you be amenable to adding to_affine/from_affine methods on PublicKey and Signature?

dot-asm commented 3 months ago

(msg, pk[], sig[])[]

Got it.

We're doing so in two steps tho. 1. aggregating the pks and sigs which have signed a single message (pk[], sig[], rand[]) -> (pk, sig)

Is it actually rand[] and not a single rand here? I mean rand[] would imply that you're multiplying pk[i] by rand[i], in which case you wouldn't have mentioned multi-scalar multiplication... I mean I don't think that you'd be looking for it...

matthewkeil commented 3 months ago

We're doing so in two steps tho. 1. aggregating the pks and sigs which have signed a single message (pk[], sig[], rand[]) -> (pk, sig)

Is it actually rand[] and not a single rand here? I mean rand[] would imply that you're multiplying pk[i] by rand[i], in which case you wouldn't have mentioned multi-scalar multiplication... I mean I don't think that you'd be looking for it...

Correct. We are using rand[], ie pk[i]*rand[i] and sig[i]*rand[i]. We are using this doc as a basis: https://ethresear.ch/t/fast-verification-of-multiple-bls-signatures/5407

dot-asm commented 3 months ago

Is it actually rand[] and not a single rand here?

Correct.

All right. So that if we're to devise an additional interface it would take a message, pks[], sigs[] and a random scalar, hash the message, multiply the hash by the scalar, pair-n-aggregate it with pks[], accumulate sigs[] and multiply the sum by the scalar. Then you'd simply merge the pairings for unique messages and perform the final verification. Does it sound sensible?

It should be noted that this wouldn't be the most efficient way, more specifically not asymptotically. By "asymptotically" I mean there is a more efficient way to do it for a larger amount of inputs. For a larger amount of the unique messages it would be more efficient to not multiply the sum of the signatures in the new interface, but leave it to the MSM. On the other hand the improvement won't be that impressive, because the execution time would still be dominated by the pairings with pks[]. But it's no trouble to make the sig[] optional just in case...

dot-asm commented 3 months ago

All right. So that if we're to devise an additional interface it would

Or maybe I've misinterpreted the "correct" remark...

dot-asm commented 3 months ago

All right. So that if we're to devise an additional interface it would

Or maybe I've misinterpreted the "correct" remark...

In essence I saw the reference to 5407 and my understanding is that you multiply hashes and signatures by randoms, not public keys. And if you multiply pks, then you wouldn't multiply signatures, but the group generator [for which you don't need msm]. Or what am I missing?

matthewkeil commented 3 months ago

All right. So that if we're to devise an additional interface it would

Or maybe I've misinterpreted the "correct" remark...

Apologies, I am not sure, but perhaps I misrepresented when I said "correct". I think "yes we believe we need to use an array of scalars" is more accurate. Considering the importance of this being correct I would love to step back for a moment.

Honestly I am swimming in deep water with regards to the math here and would love to help you help us. Thank you in advance because its always a bit hard to swallow humble pie with stuff like this and its best to let the experts make expert decisions with more knowledge. I know enough to be dangerous is why we are reaching out to you (and we have also circled in some other researchers to make sure we are on the right track) but want to go over what lead us here. Some of the below may be repeated from above but i wanted to get it out to make sure I haven't missed anything.

We found that there are a lot of attestations for the same message when listening to subnet traffic on the beacon chain. We were originally doing verifiy_multiple_aggregate_signatures over groups of those for verification ({msg, pk, sig}[]).

One of the engineers on our team was playing around with optimizations and tried something novel:

interface SignatureSet {
    message: Uint8Array;
    publicKey: PublicKey;
    signature: Signature;
}

const signatureSets: SignatureSet = [
    // many that came across the wire and were bundled in a batch for verification
    // for this example assume all sets have the same message (vote for head)
];

// Option 1 - what we were originally doing
let isValid = verifyMultipleAggregateSignatures(signatureSets);

// Option 2 - the novel idea
const aggPk = aggregatePublicKey(signatureSets.map(set => set.publicKey));
const aggSig = aggregateSignatures(signatureSets.map(set => set.signature));
isValid = verify(signatureSets[0], aggPk, aggSig); // all the messages are the same so can pick any index

We found that Option1 takes roughly 4 times as long under normal circumstances.

We shared our findings with the lighthouse team and they came back with this issue: https://github.com/sigp/lighthouse/issues/5148

And spoke with some other core devs and from our shared understanding we could resolve the issue by:

function getNonZeroBytes(length: number) {
    const rand = crypto.randomBytes(length);
    for (let i = 0; i < rand.length(); i++) {
        if (rand[i] !== 0) return rand;
    }
    rand[0] = 0x99
    return rand;
}

const SAME_RANDOM_BYTES_AS_MULT_VER = 8;
const rands = Array.from({length: signatureSets.length()}, () => {
    return getNonZeroBytes(SAME_RANDOM_BYTES_AS_MULT_VER));
});

const aggPkWithRandomness = aggregatePublicKeys(
    signatureSets.map(({publicKey}, i) => publicKey.multiplyBy(rands[i])
);
const aggSigWithRandomness = aggregateSignatures(
    signatureSets.map(({signature}, i) => signature.multiplyBy(rands[i])
);

const isValid = verify(msg, aggPkWithRandomness, aggSigWithRandomness);

We implemented the solution above using the SWIG bindings and also using a multi-threaded C++ napi binding around the blst.hpp. We found multithreaded performance is better using napi-rs and rust so we are reimplementing the node bindings by wrapping the bindings/rust from here.

I guess the first question that I have is does our "solution" seem sound. We have had others tell us that it is but I would love to get your input as well.

My biggest concern is that this solution may be taken up by other teams, because the performance benefit is HUGE. If that is the case it MUST be mathematically sound and a rock solid solution.

Thank you again btw. I really appreciate it personally and I know our team as a whole does as well!!! You are a rockstar 🎸 ninja 🥷 !!!

dot-asm commented 3 months ago

I think "yes we believe we need to use an array of scalars" is more accurate.

And what I'm saying is that the referred articles, both of them, are talking about one scalar per unique message, not a scalar per public key. And with this in mind the idea was that the new interface would take care of one message (and multiple public keys and optional signature[s]) at a time, and the application would loop through the message-scalar pairs. In other words it would be the application that would know everything about array of random scalars, unique messages and public-key/signature relationships, a new interface doesn't really need to see beyond a single message-scalar pair and corresponding keys.

But considering your suggestion, which is apparently not the same as in referred articles.

const aggPkWithRandomness = aggregatePublicKeys(
    signatureSets.map(({publicKey}, i) => publicKey.multiplyBy(rands[i])
);
const aggSigWithRandomness = aggregateSignatures(
    signatureSets.map(({signature}, i) => signature.multiplyBy(rands[i])
);

const isValid = verify(msg, aggPkWithRandomness, aggSigWithRandomness);

While this would work I don't see that it would add any security to the system. I mean I reckon it would be as secure as performing the standard FastAggregateVerify (terminology from the draft-irtf-cfrg-bls-signature-05), or as you referred to it as "novel idea" ;-) In other words the randomization appears superfluous. Can you elaborate on why you're considering it to ~being~ begin with?

A word of caution. FastAggregateVerify is ~sane~ sound only if you have proofs of possession for the public keys. [Do you?]

wemeetagain commented 3 months ago

Can you elaborate on why you're considering [the randomization] to begin with?

The problem with doing FastAggregateVerify, as-is, is the same as described in 5407, namely that it allows an attacker to force a node running this optimization to forward invalid data. The difference here is that the 5407 construction is dealing with pre-aggregated signatures and we are dealing with signatures yet to be aggregated.

Consider two honest attesters publishing unaggregated attestation signatures $S1$, $S2$. An attacker is connected to the victim node, where the victim implements this optimization. The attacker mutates the signatures such that $S1' = S1 + Δ$, $S2' = S2 - Δ$ and forwards them to the victim. The check $e(G, S1' + S2') =? e(P1 + P2, Hm)$ will pass. If the victim strictly uses aggregated pubkey from now on it will never attempt to include invalid data on-chain in other gossip topics. However, it will forward attestations with invalid signatures and will quickly be banned by all other peers. As such an attacker can isolate the victim node forcing it to connect exclusively to other nodes implementing the optimization and attackers.

So it seems important to multiply each unaggregated signature by a unique random scalar (before aggregating and doing FastAggregateVerify) to avoid this. And as for why we'd multiply those the random scalars into public keys rather than messages is 1. initially ignorance and 2. attempting to work within the interfaces currently provided -- even if it requires more computation. If we wanted to multiply those scalars into messages instead of public keys, we would need a new interface that accepted something like (msg, pk[], sig[], rand[])[] -- the difference from verify_multiple_aggregate_signatures being that rands were appropriately multed into sigs and summed and multed into msg.

Do you have proofs of possession for the public keys?

Yea we do

dot-asm commented 3 months ago

I see. Cool. Just in case, declaration of soundness of a new/modified scheme is a matter of communal consensus, and this is not the forum to reach one. We can discuss performance and interfaces here, but it won't vouch for the cryptographic fitness. With this in mind...

Significant costs to consider in the context are hash-to-s, scalar multiplications and Miller loops. The current verify_multiple_aggregate_signatures makes you pay the full cost per each public key and signature. Which is unfortunate, because indeed, if some of the messages are the same, one can reduce the amount of some operations. 5407 minimizes the amount of hash-to-s to one and scalar multiplications to a pair, one in the base and [one in the] extension fields, per unique message, and keeps the amount of Miller loops. The proposed approach on the other hand minimizes the amount of hash-to-s and Miller loops to one [per unique message] and keeps the amount of scalar multiplications. So it boils down to the costs of Miller loop vs. pair of scalar multiplications. And the catch is that the cost of the said pair of isolated full-width scalar multiplications is approximately the same as Miller loop. But there is room for improvement in the highlighted keywords. And one of them effectively renders the suggestion in this PR moot, because it keeps multiplications isolated. Or in other words MSM is the most sensible option. Let me ponder about the practicalities some more...

dot-asm commented 3 months ago

If we wanted to multiply those scalars into messages instead of public keys, we would need a new interface that accepted something like (msg, pk[], sig[], rand[])[]

(msg, pk[], sig[], rand)[] and my suggestion is to let application handle the outermost [], so just (msg, pk[], sig[], rand).

But back to the suggested approach. It's not a problem to wire MultiPoint to slices of PublicKeys and Signatures. As in

        impl MultiPoint for [PublicKey] {
            type Output = AggregatePublicKey;

            fn mult(&self, scalars: &[u8], nbits: usize) -> Self::Output {
                Self::Output {
                    point: unsafe {
                        core::mem::transmute::<&[_], &[$pk_aff]>(self)
                    }
                    .mult(scalars, nbits),
                }
            }
       ...

And then build aggregate_with_randomness upon that. I personally see no real advantage in implementing composite call that takes both pks[] and sigs[], it shouldn't be a problem for an application to make a pair of calls...

wemeetagain commented 3 months ago

wire MultiPoint to slices of PublicKeys and Signatures

That would be great!

asanso commented 3 months ago

We can discuss performance and interfaces here, but it won't vouch for the cryptographic fitness.

FWIW I am with @dot-asm here. @matthewkeil would you have any more formal write up of the optimization (code aside?). Not asking for a full academic paper but at least few lines of description in a note will help IMHO

matthewkeil commented 3 months ago

Closing in favor of #226

matthewkeil commented 3 months ago

Gist covering history leading to this PR.

https://gist.github.com/wemeetagain/d52fc4b077f80db6e423935244c2afb2