onflow / flips

Flow Improvement Proposals
25 stars 23 forks source link

[Networking] FLIP: Message Forensic (MF) System #195

Open yhassanzadeh13 opened 1 year ago

yhassanzadeh13 commented 1 year ago

https://github.com/onflow/flips/issues/259

Summary

This FLIP discusses and compares two potential solutions for the Message Forensic (MF) system in the Flow protocol — a system that identifies and attributes protocol violations to the original malicious sender. The two solutions under consideration are: (1) GossipSub Message Forensic (GMF), and (2) Enforced Flow-level Signing Policy For All Messages. We delve into both, listing their pros and cons, to determine which would be more feasible given the considerations of ease of implementation, performance efficiency, and security guarantees.

Our analysis finds the "Enforced Flow-level Signing Policy For All Messages" to be the more promising option, offering a generalized solution that doesn’t hinge on the protocol utilized to send the message, steering clear of the complexities tied to maintaining GossipSub envelopes and dodging the necessity of duplicating GossipSub router’s signature verification procedure at the engine level. Furthermore, it meshes well with the Flow protocol’s existing state.

Review Guide

This FLIP is presented as a Pull Request (PR) in the flow-go repository. We welcome reviewers to express their opinions and share feedback directly on the PR page, aiming for a structured and productive discussion. To aid this, please adhere to one of the following response frameworks:

  1. I favor the "Enforced Flow-level Signing Policy For All Messages" and here are my thoughts:
  2. I support the "GossipSub Message Forensic (GMF)" approach, articulating my views as follows.
  3. I find both propositions unsatisfactory, elucidating my stance with.
gomisha commented 1 year ago

I favor the "Enforced Flow-level Signing Policy For All Messages" (I have suggested to rename this to FLS) and here are my thoughts:

My biggest concern in adopting FLS is the backward compatibility issues this will cause as it will be a major breaking change. But from a design and maintainability perspective, I like this appraach over GMF.

bluesign commented 1 year ago

I support the "GossipSub Message Forensic (GMF)" approach, articulating my views as follows:

I raised before (on discord) about moving all communication on GossipSub ( unicast messages too ), but turned out there are some messages assumed to be sent on a direct 1:1 connection. I think solving that is also important. ( I don't have much information about current topology, but I think it maybe possible to have direct peering between components that require 1:1 connections ) Eventually I think objective is to allow anyone to join network (let's say any AN), making it over GossipSub security and scalability guarantees and would benefit in the long run.

yhassanzadeh13 commented 1 year ago

I support the "GossipSub Message Forensic (GMF)" approach, articulating my views as follows:

  • Network layer is clearly separated from internal messaging logic.
  • By carrying GossipSub envelope without data we can prevent all the overhead. (without data GossipSub envelope should be pretty small )
  • I think slashing / protocol violation is a very rare case, signing in gossipSub is pretty basic, duplicate GossipSub verification is only needed when protocol violation is raised and only required for the parts that are responsible for deciding on protocol violations. ( otherwise we have 2 signature verifications, one on network layer, one on application layer )

I raised before (on discord) about moving all communication on GossipSub ( unicast messages too ), but turned out there are some messages assumed to be sent on a direct 1:1 connection. I think solving that is also important. ( I don't have much information about current topology, but I think it maybe possible to have direct peering between components that require 1:1 connections ) Eventually I think objective is to allow anyone to join network (let's say any AN), making it over GossipSub security and scalability guarantees and would benefit in the long run.

@bluesign, it is imperative in GMF for an engine to retain the full integrity of the GossipSub envelope; the function hinges fundamentally on the wholeness of the data present. This is not merely a procedural stipulation but a requisite grounded in the necessity of verifying the signature, which encapsulates the entire envelope, thereby making it infeasible to relay the envelope devoid of its data contents.

By carrying GossipSub envelope without data we can prevent all the overhead. (without data GossipSub envelope should be pretty small )

⚠️ The perspective you depicted seems to bypass a critical phase of the verification process that involves:

  1. Ascertaining the correlation between the event and the envelope, i.e., the event belongs to the envelope (requires data part of the envelope).
  2. Authenticating the signature of the envelope, i.e., the envelope is attributable to the sender through its networking key (requires the entire envelope).

This is not about entrusting the networking layer blindly; it is about endorsing a methodologically sound practice that employs cryptographic primitives to furnish verifiable proofs, fostering a trust environment among Flow nodes that should not build trust but rather on cryptographically verifiable primitives. Otherwise, we may not even need a forensic mechanism in the first place.

By reducing the envelope to a fragmentary state, the engine's self-sufficient capacity for verification is impeded and essentially undermines the core objective of establishing irrefutable proofs. It is important to not overlook the cardinal principles that govern the GMF, which pivot on the complete and unaltered state of the data encompassed in the envelope. Thus, the GMF solution must consider the integral role of the intact envelope in sustaining the very foundation of the system we are discussing, ensuring the accurate verification of both the event’s association with the envelope and the envelope's signature.

bluesign commented 1 year ago

What I was proposing is something like this:

Some struct like message.Message as the Flow message. ( enriched with seqno, topic. signature, key from pb.Message, and decoded event as interface{} ) passed to engine. So technically engine will have everything to prove (with little overhead). But as gossipSub is defending against impersonation etc, I don't think engine needs to do a signature check here.

In case of conflict, this new struct is self sufficient to raise a claim. Then who is responsible checking for this violation can get the this struct from the node, reconstruct pb.Message and do the signature verification. if signature verification succeed, then it can punish the offender ( if not can punish the claimer )

yhassanzadeh13 commented 1 year ago

What I was proposing is something like this:

Some struct like message.Message as the Flow message. ( enriched with seqno, topic. signature, key from pb.Message, and decoded event as interface{} ) passed to engine. So technically engine will have everything to prove (with little overhead). But as gossipSub is defending against impersonation etc, I don't think engine needs to do a signature check here.

In case of conflict, this new struct is self sufficient to raise a claim. Then who is responsible checking for this violation can get the this struct from the node, reconstruct pb.Message and do the signature verification. if signature verification succeed, then it can punish the offender ( if not can punish the claimer )

I don't think engine needs to do a signature check here.

@bluesign yes, technically the engine doesn't have to check the signature itself and can rely on the data from its networking layer. But, the core idea behind the FLIP is to make sure that any proof it gives when reporting a rule-breaking move is fully self-standing. This means that if node A is saying that node B did something wrong, using evidence E, then anyone else should be able to see that node B was indeed in the wrong just by looking at E.

For this to work, node A needs to share the original message as it was, signature and all. Even a tiny change to the message stops others from being able to confirm the signature is real. The signature covers everything in the entire envelope, not just parts of it. Below is a copy of the entire pb.Message and based on the signing code snippet, everything except XXX_sizecache and XXX_NoUnkeyedLiteral fields are required for the signature verification. The XXX_sizecache and XXX_NoUnkeyedLiteral fields are currently unused, so they won't add any overhead and skimming them off will not save any substantial overhead. Moreover, including the Seqno, From, Data, Topic, Signature, Key and XXX_unrecognized fields in the forensic data that is passed to engine is literally the same as sharing the pb.Message itself with the engine. Not having any of these fields for the engine means inability to build self-standing evidence. Notably, we must share the Data with the engine, hence "without data GossipSub envelope should be pretty small" will not stand valid.

type Message struct {
    From                 []byte   `protobuf:"bytes,1,opt,name=from" json:"from,omitempty"`
    Data                 []byte   `protobuf:"bytes,2,opt,name=data" json:"data,omitempty"`
    Seqno                []byte   `protobuf:"bytes,3,opt,name=seqno" json:"seqno,omitempty"`
    Topic                *string  `protobuf:"bytes,4,opt,name=topic" json:"topic,omitempty"`
    Signature            []byte   `protobuf:"bytes,5,opt,name=signature" json:"signature,omitempty"`
    Key                  []byte   `protobuf:"bytes,6,opt,name=key" json:"key,omitempty"`
    XXX_NoUnkeyedLiteral struct{} `json:"-"`
    XXX_unrecognized     []byte   `json:"-"`
    XXX_sizecache        int32    `json:"-"`
}

In conclusion, when we talk about “reducing overhead,” creating a new structure with only some details from the pb.Message isn't going to help. The proof needs the full pb.Message to be reliable. So, it's not really about more or less "overhead", it's about keeping the proof verifiable and trustworthy.

Then who is responsible checking for this violation can get the this struct from the node, reconstruct pb.Message and do the signature verification.

It appears you are advocating for the alteration of the current one-step process into a more interactive two-step procedure, though the advantages of this aren't entirely clear. Let's take a closer look:

  1. As outlined in the original GMF proposal, when the networking layer sends an event to an engine, it should include all the necessary forensic data (i.e., pb.Message). This approach empowers the engine to craft evidence that is fully self-sufficient, meaning other nodes can validate it independently without further input or clarification from the originating engine.

  2. Your approach seems to suggest that not all forensic data is essential in building evidence. It implies we should select and use only certain parts of the data to form what might essentially be partial evidence. Subsequently, nodes wishing to authenticate this evidence would need to reach out for additional details. This fundamentally changes the protocol to one where pb.Message details must be stored and retrieved, creating an environment ripe for increased complexity and engineering challenges (as previously described here) without an obvious enhancement to the current system.

AlexHentschel commented 1 year ago

This is going to be a bit of a longer reply, sorry 😅. I think there is quite a few nuances and applications patterns that need to be considered.

Central design goals

Thoughts on the metrics to rank designs by

Analyzing the proposals

Regarding proposal-2: Flow-level Signing Policy (FSP)

Regarding proposal-1: GossipSub Message Forensic (GMF)

Generally I think this is the direction to go. Lets look at our ranking criteria:

My thoughts on complexity

Suggestions:

My take on the Disadvantages

We need to extend the signature verification mechanism to account for translation of originId from flow.Identifier to networking key and peer.ID (i.e., LibP2P level identifier). As the engines are operating based on the flow.Identifier, while the GossipSub signatures are generated using the Networking Key of the node.

The first step is to ensure the event is wrapped in a GossipSub envelope. If not, the verification fails. For this we need to replicate the entire encoding path down to the GossipSub level as wrapping the Flow message in the GossipSub envelope is done internally at the GossipSub and is not exposed to the Flow codebase. The replication may also cause another layer of coupling that causes breaking changes in the future upgrades of GossipSub.

jordanschalm commented 1 year ago

I support the "GossipSub Message Forensic (GMF)" approach, with some of the extensions suggested by Peter & Alex.

Interface Changes

To accommodate a ForensicsContext accompanying messages, we need to modify the MessageProcessor interface. The FLIP proposes this interface, where the envelope is added as a separate parameter.

Process(channel channels.Channel, originID flow.Identifier, event interface{}, envelope *pb.Message) error

I agree with Peter's suggestion of instead including the event itself and forensic data within a single higher-level structure:

Have a high level envelope interface with methods to get the underlying event and forensic data.

There are benefits to passing a Message or envelope interface type rather than an any type, for example:

Since we need to change the interface for this proposal, we should change it in a way that opens up some of these options in the future without further API changes.

Process(channel channels.Channel, originID flow.Identifier, message flow.Message) error

type Message interface {
    Event() any
    ForensicsCtx() ForensicsContext
    // ...
}

Side Note: As the ForensicsContext will include the origin ID, we could remove this parameter from Process as well.

Adding some colour to the comparison of runtime impact

To explicitly state what Alex touched on in his comment: To enable linking messages to their corresponding network signature and envelope, we need to retain a reference to already-allocated memory for (slightly) longer. We do not need to allocate additional memory for each message.

What is the memory impact of retaining pb.Message references for the complete processing duration of a message?

Here's some back-of-the-envelope calculations:

Average GC cycles on Mainnet averages around 0.5-2 per minute, depending on the node role. (metrics link)

image

Suppose it takes 100ms on average to process a message. Then the memory impact would be upper-bounded by the proportion of messages we receive within 100ms of a GC cycle. If we assume the larger 2 GCs/min, then about 0.3% of the memory associated with pb.Message allocations would be retained for an extra GC cycle as a result of the longer retention. Even if all the suggested 16GB RAM were used for pb.Messages, we would incur only a maximum of 50MB additional cost by retaining the message reference as suggested. In practice, this should be much much lower (likely <10MB in practice)

kc1116 commented 1 year ago

I support the "GossipSub Message Forensic (GMF)" approach, articulating my views as follows.

tarakby commented 1 year ago

I do not support the "Enforced Flow-level Signing Policy For All Messages", I support the idea of "GossipSub Message Forensic (GMF)" but some changes may be necessary.

First, I wanted to clarify two concepts or cryptographic services, since they will be used twice in my comment:

These definitions may have social/legal aspects so we can assume for simplicity that we are not attributing messages to Bob as a party, but we are attributing messages to a party controlling a private key corresponding to some public key shared with the public. Non-repudiation is a stronger concept than authentication, and it prevents Bob from denying being the origin of some message. I believe @AlexHentschel has mentioned the same concepts and called them attributability (for authentication) and provability (for non-repudiation). As an example in cryptographic primitives, signature schemes offer both properties while message authentication code (MAC) between 2 parties only offers authentication. The purpose of this FLIP is to provide protocol-level non-repudiation, since authentication seems to be already implemented on the network layer.

I do not support "Enforced Flow-level Signing Policy For All Messages" because:

  1. It seems to me that it does not provide non-repudiation (I believe @AlexHentschel also pointed this out). If a message has a valid network authentication, but invalid protocol level signature, there is no way the protocol can attribute the message to the original sender, because the protocol does not recognize the network level authentication. I believe this is the issue we wanted to solve in the first place.
  2. It is a redundant level of authentication IMO as @yhassanzadeh13 pointed out. The networking layer is already authenticating messages, and some protocol messages are already being authenticated using the staking key (consensus votes for instance). We could disable the extra Flow-level signature for those "already-signed" messages but that may mess up with the engine modularity. Note that the protocol signatures are not always signing the message envelope (some signed messages are even omitted from the payload itself).
  3. a less important point is about the signature scheme chosen. Protocol level authentication is currently using BLS. BLS is only relevant when multi-signature is needed (for instance aggregation and batch verification), but is not optimized for basic signatures (on my laptop, one BLS verification is 13x slower than ECDSA and 16.5x slower than EdDSA - considering our new fast BLS implementation).

While I support "GossipSub Message Forensic (GMF)", I wanted to clarify a few points, in particular about our current implementation using libp2p:

  1. I think it makes sense to differentiate the protocol specs from its implementation. We may want to base our specs on libp2p (since it is used in the current and only implementation of Flow nodes), but we should be able to describe the attribution data in a spec. The attribution data is part of a challenge that would eventually be posted in blocks and the protocol state, and should therefore be described regardless of libp2p or any other implementation.
  2. we are using lip2p2 unicast (for 1-1) and PubSub (for 1-many). I am going to only consider unicast messages (I didn't dig deep in libp2p's pubsub):
    1. libp2p isn't simply signing every network payload by the sender's networking private key and then verifying it on the receiving end by the sender public key. This is the complexity I believe @yhassanzadeh13 mentioned (also answering your question @AlexHentschel). The networking key isn't used to sign any network message, it only signs a second level key called a "static key", and therefore delegates authentication to the static key. This is a choice libp2p took to allow multiple signature schemes for the networking key (Flow doesn't need this choice). The static keys are then used in a Noise handshake involving other ephemeral keys to perform a key agreement. The symmetric keys are then used in an authenticated encryption (Flow doesn't need encryption here, but it is enabled by default in libp2p). The symmetric keys are what provide authentication, not the networking keys node share when staking. I have discussed this long time ago when we switched from SECIO to Noise (low level security protocols used by libp2p). Ok all this sounds complicated, but can we still make libp2p export all the data needed for attribution and trace it back to the networking key?
    2. the key agreement above is a Diffie-Hellman (DH) involving both static and ephemeral keys. Proving correctness of the shared key to third parties includes exporting the static private key of the node ⚠️
    3. the shared key, as it names says, is shared between both ends of the communication. Authentication is provided by authenticated encryption (AEAD). You can think of this as encryption mixed with a MAC (if you're not familiar with a MAC, it is the symmetric version of a signatures). As I mentioned in the beginning of my message, MAC does not provide non-repudiation. Evil Bob can always claim they never signed the payload (even though honest Alice knows he did), and that because the signing key is shared ⚠️

I didn't look at PubSub in details, it seems that the networking key (not libp2p static key) is used to sign original payloads (which is great news) but we still need to confirm it.

If we want to implement GMF with libp2p we would need to make some changes to libp2p and its use of Noise for 1-1 (maybe through a fork). I will stop my reply here and we can get into ideas on how to update libp2p later.

AlexHentschel commented 1 year ago

thanks @tarakby for your detailed comment. Lets try to use the established terminology of authentication and non-repudiation going forward. Thanks Tarak for clearly explaining the concepts.

KshitijChaudhary666 commented 7 months ago

Hi @yhassanzadeh13 - this FLIP is not reflected on FLIP project tracker. Did you follow the process outlined in https://github.com/onflow/flips? Specifically please remember to do the following without which the FLIP won't get visibility on the project tracker-

Create an issue by using one of the FLIP issue templates based on the type of the FLIP - application, governance, cadence or protocol. The title of the issue should be the title of your FLIP, e.g., "Dynamic Inclusion fees". Submit the issue. Note the issue number that gets assigned. Then, Create your FLIP as a pull request to this repository (onflow/flips). Use the issue number generated in step 2 as the FLIP number. And mention the FLIP issue by copying the GitHub URL or the issue in the comment section.

Thank you!

KshitijChaudhary666 commented 5 months ago

Hi @yhassanzadeh13 - following up on my message above.

This FLIP is not reflected on FLIP project tracker. Can you pls follow the process outlined in https://github.com/onflow/flips? Specifically the following without which the FLIP won't get visibility on the project tracker-

AlexHentschel commented 5 months ago

@KshitijChaudhary666 Yahya has moved on to a new professional opportunity. This Flip is currently iceboxed (at least from the perspective of Flow foundation resourcing it - which does not preclude community contributors from picking it up, though they would need to align tech details with us). In summary it is up to us now to do with this flip whatever we do with flips currently on ice (please let me know).

vishalchangrani commented 5 months ago

I am wondering what state we should be targeting for this PR. There is a lot of important and valuable discussion here in the PR, which might still lead to changes in the proposed FLIP.

hey @AlexHentschel - we will not close the PR.

Just to be clear, The issue to track this FLIP is: https://github.com/onflow/flips/issues/259 The FLIP ID is the same as the issue ID (as per the new process): 259. This PR can remain open till there is agreement on the FLIP.