Open yhassanzadeh13 opened 1 year ago
I favor the "Enforced Flow-level Signing Policy For All Messages" (I have suggested to rename this to FLS) and here are my thoughts:
My biggest concern in adopting FLS is the backward compatibility issues this will cause as it will be a major breaking change. But from a design and maintainability perspective, I like this appraach over GMF.
I support the "GossipSub Message Forensic (GMF)" approach, articulating my views as follows:
I raised before (on discord) about moving all communication on GossipSub ( unicast messages too ), but turned out there are some messages assumed to be sent on a direct 1:1 connection. I think solving that is also important. ( I don't have much information about current topology, but I think it maybe possible to have direct peering between components that require 1:1 connections ) Eventually I think objective is to allow anyone to join network (let's say any AN), making it over GossipSub security and scalability guarantees and would benefit in the long run.
I support the "GossipSub Message Forensic (GMF)" approach, articulating my views as follows:
- Network layer is clearly separated from internal messaging logic.
- By carrying GossipSub envelope without data we can prevent all the overhead. (without data GossipSub envelope should be pretty small )
- I think slashing / protocol violation is a very rare case, signing in gossipSub is pretty basic, duplicate GossipSub verification is only needed when protocol violation is raised and only required for the parts that are responsible for deciding on protocol violations. ( otherwise we have 2 signature verifications, one on network layer, one on application layer )
I raised before (on discord) about moving all communication on GossipSub ( unicast messages too ), but turned out there are some messages assumed to be sent on a direct 1:1 connection. I think solving that is also important. ( I don't have much information about current topology, but I think it maybe possible to have direct peering between components that require 1:1 connections ) Eventually I think objective is to allow anyone to join network (let's say any AN), making it over GossipSub security and scalability guarantees and would benefit in the long run.
@bluesign, it is imperative in GMF for an engine to retain the full integrity of the GossipSub envelope; the function hinges fundamentally on the wholeness of the data present. This is not merely a procedural stipulation but a requisite grounded in the necessity of verifying the signature, which encapsulates the entire envelope, thereby making it infeasible to relay the envelope devoid of its data contents.
By carrying GossipSub envelope without data we can prevent all the overhead. (without data GossipSub envelope should be pretty small )
⚠️ The perspective you depicted seems to bypass a critical phase of the verification process that involves:
This is not about entrusting the networking layer blindly; it is about endorsing a methodologically sound practice that employs cryptographic primitives to furnish verifiable proofs, fostering a trust environment among Flow nodes that should not build trust but rather on cryptographically verifiable primitives. Otherwise, we may not even need a forensic mechanism in the first place.
By reducing the envelope to a fragmentary state, the engine's self-sufficient capacity for verification is impeded and essentially undermines the core objective of establishing irrefutable proofs. It is important to not overlook the cardinal principles that govern the GMF, which pivot on the complete and unaltered state of the data encompassed in the envelope. Thus, the GMF solution must consider the integral role of the intact envelope in sustaining the very foundation of the system we are discussing, ensuring the accurate verification of both the event’s association with the envelope and the envelope's signature.
What I was proposing is something like this:
Some struct like message.Message as the Flow message. ( enriched with seqno, topic. signature, key from pb.Message, and decoded event as interface{} ) passed to engine. So technically engine will have everything to prove (with little overhead). But as gossipSub is defending against impersonation etc, I don't think engine needs to do a signature check here.
In case of conflict, this new struct is self sufficient to raise a claim. Then who is responsible checking for this violation can get the this struct from the node, reconstruct pb.Message and do the signature verification. if signature verification succeed, then it can punish the offender ( if not can punish the claimer )
What I was proposing is something like this:
Some struct like message.Message as the Flow message. ( enriched with seqno, topic. signature, key from pb.Message, and decoded event as interface{} ) passed to engine. So technically engine will have everything to prove (with little overhead). But as gossipSub is defending against impersonation etc, I don't think engine needs to do a signature check here.
In case of conflict, this new struct is self sufficient to raise a claim. Then who is responsible checking for this violation can get the this struct from the node, reconstruct pb.Message and do the signature verification. if signature verification succeed, then it can punish the offender ( if not can punish the claimer )
I don't think engine needs to do a signature check here.
@bluesign yes, technically the engine doesn't have to check the signature itself and can rely on the data from its networking layer. But, the core idea behind the FLIP is to make sure that any proof it gives when reporting a rule-breaking move is fully self-standing. This means that if node A is saying that node B did something wrong, using evidence E, then anyone else should be able to see that node B was indeed in the wrong just by looking at E.
For this to work, node A needs to share the original message as it was, signature and all. Even a tiny change to the message stops others from being able to confirm the signature is real. The signature covers everything in the entire envelope, not just parts of it. Below is a copy of the entire pb.Message
and based on the signing code snippet, everything except XXX_sizecache
and XXX_NoUnkeyedLiteral
fields are required for the signature verification. The XXX_sizecache
and XXX_NoUnkeyedLiteral
fields are currently unused, so they won't add any overhead and skimming them off will not save any substantial overhead. Moreover, including the Seqno
, From
, Data
, Topic
, Signature
, Key
and XXX_unrecognized
fields in the forensic data that is passed to engine is literally the same as sharing the pb.Message
itself with the engine. Not having any of these fields for the engine means inability to build self-standing evidence. Notably, we must share the Data
with the engine, hence "without data GossipSub envelope should be pretty small" will not stand valid.
type Message struct {
From []byte `protobuf:"bytes,1,opt,name=from" json:"from,omitempty"`
Data []byte `protobuf:"bytes,2,opt,name=data" json:"data,omitempty"`
Seqno []byte `protobuf:"bytes,3,opt,name=seqno" json:"seqno,omitempty"`
Topic *string `protobuf:"bytes,4,opt,name=topic" json:"topic,omitempty"`
Signature []byte `protobuf:"bytes,5,opt,name=signature" json:"signature,omitempty"`
Key []byte `protobuf:"bytes,6,opt,name=key" json:"key,omitempty"`
XXX_NoUnkeyedLiteral struct{} `json:"-"`
XXX_unrecognized []byte `json:"-"`
XXX_sizecache int32 `json:"-"`
}
In conclusion, when we talk about “reducing overhead,” creating a new structure with only some details from the pb.Message
isn't going to help. The proof needs the full pb.Message
to be reliable. So, it's not really about more or less "overhead", it's about keeping the proof verifiable and trustworthy.
Then who is responsible checking for this violation can get the this struct from the node, reconstruct pb.Message and do the signature verification.
It appears you are advocating for the alteration of the current one-step process into a more interactive two-step procedure, though the advantages of this aren't entirely clear. Let's take a closer look:
As outlined in the original GMF proposal, when the networking layer sends an event to an engine, it should include all the necessary forensic data (i.e., pb.Message
). This approach empowers the engine to craft evidence that is fully self-sufficient, meaning other nodes can validate it independently without further input or clarification from the originating engine.
Your approach seems to suggest that not all forensic data is essential in building evidence. It implies we should select and use only certain parts of the data to form what might essentially be partial evidence. Subsequently, nodes wishing to authenticate this evidence would need to reach out for additional details. This fundamentally changes the protocol to one where pb.Message
details must be stored and retrieved, creating an environment ripe for increased complexity and engineering challenges (as previously described here) without an obvious enhancement to the current system.
This is going to be a bit of a longer reply, sorry 😅. I think there is quite a few nuances and applications patterns that need to be considered.
We design for a scenario, where over longer periods of time, 99.9% messages are honest (or higher). In the ideal case, none of the code paths leading to slashing of a node are ever executed. Resource consumption and runtime impact should be optimized for the happy path scenario. We are willing to accept significant performance deterioration, in case of a attackers committing slashable protocol violations - as long as on the order of some minutes, the network has slashed the offending nodes and ejected them. By virtue of being implemented, slashing will be enough of a deterrent.
For the overwhelming amount of messages, the additional information needed for message forensics is emphemeral. More concretely, for nearly all messages, their validity and protocol compliance will be confirmed within milliseconds. At this point, the additional message forensics information can be discarded.
Under normal operational scenarios (overwhelmingly dominant), there are zero to few unchecked messages in the engine's inbound queues. It is completely tractable to keep the additional information needed for message forensics temporarily in memory and let the garbage collector clean it up, once we know the message is honest and can drop the respective in-memory reference to the forensics data. My gut feeling is that this might cost us less than 200MB of extra ram per node, for almost all messages combined. There might be 1 or 2 exceptions (most prominently ChunkDataResponse messages), but there we can add special purpose optimizations. But for the large majority of messages, we can temporarily keep both the deserialized event in ram as well as pb.Message
(that is including the event again in a serialized form). We can always optimize later.
It is generally highly desired to support the rotation of keys. We don't have to implement it now, but we also want to avoid designs making key rotation harder or impossible.
runtime impact: BLS signatures (staking key is BSL key) are computationally much more costly to generate verify an ECDSA signatures (networking key).
With the FSP proposal, we would incur the additional cost of an added BSL signature consistently on the happy path of the protocol.
strength of protection: It is important to note that the FSP reduces the surface for attributable but not provable protocol violations, but does not eliminate it. Essentially, we are still (with a very small surface) violating Moxie Marlinspike's Cryptographic Doom Principle [1] -- we are leaving some surface, where a node expends resources but cannot prove protocol violations.
I would argue that with the peer ranking system, we already have a a good foundational defense. But it leaves that gap of attributable but not provable protocol violations. FSP makes this gap smaller, but FSP doesn't close it entirely. Therefore, I questioning whether FSP is impactful enough compared to the required engineering time and the runtime cost -- especially since I believe there are ways to close this gap entirely.
complexity: I agree that that the engineering work is probably not too big of a lift.
Generally I think this is the direction to go. Lets look at our ranking criteria:
runtime impact:
If we keep pb.Message
in memory for the short time we need it and then just garbage collect it, we would only expend a bit of ram. No latency or significant computational cost.
strength of protection:
No gap in security surface. Every attributable protocol violation is conceptually provable. Very strong security guarantee!.
complexity:
While I understand the concerns about Implementation Complexities and Disadvantages, I think there are quite pragmatic solutions for those concerns. Overall, I am optimistic that we can achieve a more comprehensive solution without significantly extending the engineering cost compared to FSP.
In my mind, the responsibility to adjudicate slashing challenges is completely at the protocol layer. Slashing challenges are separate messages that are exchanged within the protocol. They are self-contained and should either reference other message, which the protocol has already embedded in blocks (e.g. execution receipts). Or the slashing challenge should contain the offending messages itself (the entire message, incl. all envelopes).
I am of the opinion, that adjudicating slashing evidence is entirely out of the scope of the networking layer or the engines. They are focused on processing the newest messages necessary for extending the chain. The networking layer and the core protocol logic underlying the engines must be modular enough and provide the primitives allowing to validate slashing evidence outside of the happy path logic (engines). In other words, the networking layer and core protocol layer should expose very low-level functions that can be called by the adjudication logic when processing slashing challenges and the evidence contained therein.
I think the networking layer should continue what it is doing already: authenticates inbound messages before handing them to the engines (protocol layer).
separate VerifyGossipSubMessage
from EngineRegistry
. Reasoning:
The adjudication logic should provide the networking key (maybe better Identity
?) for each message it is requesting to be verified. This is necessary to allow key rotation (e.g. at epoch boundaries), but still allowing to verify authenticity of a message from the past epoch.
Therefore, the API of interfacing with the engines is very different from interfacing with the adjudication logic. Therefore, I think both should be separate interfaces.
Encapsulate all the auxiliary forensics information and sub cases for unicast vs multicast into a ForensicsContext
. The following sketch compares the current implementation to my flavour of the GMF proposal
The ForensicsContext
could look something like this:
type ForensicsContext interface {
Channel() channels.Channel
OriginID() flow.Identifier
// BinaryMessage returns the binary representation of the message from the perspective of the networking layer.
// This representation should contain the necessary information to:
// * deserialize the Protocol-Layer message (e.g. for proving a protocol violation to another staked node)
// * cryptographically confirm authenticity and integrity of the Protocol-Layer message
// via the origin's networking key.
// * The origin's public networking key is _not_ contained in the BinaryMessage
// In a nutshell, the returned value should present the necessary evidence to prove to a third party that the
// message was really sent by the origin. The protocol guarantees that the party inspecting this evidence knows
// the origin's public networking key. We cannot include the origin's public networking key here,
// as this would _not_ be BFT.
BinaryMessage() []byte // TBD: suitable return type here. Not sure whether byte is the best.
}
We need to extend the signature verification mechanism to account for translation of originId from
flow.Identifier
to networking key andpeer.ID
(i.e., LibP2P level identifier). As the engines are operating based on the flow.Identifier, while the GossipSub signatures are generated using the Networking Key of the node.
I think verifying networking signatures is not the responsibility of engines. The adjudication logic might need to bridge flow.Identifier
and peer.ID
. Though, the adjudication logic presumably already knows which node is being accused of a protocol violation (that information would be part of the slashing challenge) and has looked up its networking key.
Not really sure I follow why this is outside of the implementation that we already have. Potentially our existing code needs to be refactored to be more modular, but it is hard for me to see why this would be super complex.
The first step is to ensure the event is wrapped in a GossipSub envelope. If not, the verification fails. For this we need to replicate the entire encoding path down to the GossipSub level as wrapping the Flow message in the GossipSub envelope is done internally at the GossipSub and is not exposed to the Flow codebase. The replication may also cause another layer of coupling that causes breaking changes in the future upgrades of GossipSub.
I don't understand why we need to "replicate the entire encoding path down to the GossipSub level". The way I understood the description, we have the raw GossipSub message, which contains all information. We can decode the protocol-layer event from that raw GossipSub message, can't we?
In a nutshell, this is all that Charlie needs to do when he want to adjudicate a slashing challenge raised by Alice (see my picture above for context) :
pb.Message
(contained in Alice's slashing challenge) is in fact originating from Edward. From what I understood, I think that is possible. pb.Message
, decode the event and verify that it violates the protocol in accordance with Alice's complaint I support the "GossipSub Message Forensic (GMF)" approach, with some of the extensions suggested by Peter & Alex.
To accommodate a ForensicsContext
accompanying messages, we need to modify the MessageProcessor
interface. The FLIP proposes this interface, where the envelope is added as a separate parameter.
Process(channel channels.Channel, originID flow.Identifier, event interface{}, envelope *pb.Message) error
I agree with Peter's suggestion of instead including the event itself and forensic data within a single higher-level structure:
Have a high level envelope interface with methods to get the underlying event and forensic data.
There are benefits to passing a Message
or envelope interface type rather than an any
type, for example:
Message
type, rather than manually selecting the label at each callsite, which is much more susceptible to human error (example)Message
s identifiers to be able to trace their progress through different components. Since we need to change the interface for this proposal, we should change it in a way that opens up some of these options in the future without further API changes.
Process(channel channels.Channel, originID flow.Identifier, message flow.Message) error
type Message interface {
Event() any
ForensicsCtx() ForensicsContext
// ...
}
Side Note: As the ForensicsContext
will include the origin ID, we could remove this parameter from Process
as well.
To explicitly state what Alex touched on in his comment: To enable linking messages to their corresponding network signature and envelope, we need to retain a reference to already-allocated memory for (slightly) longer. We do not need to allocate additional memory for each message.
pb.Message
references for the complete processing duration of a message?Here's some back-of-the-envelope calculations:
Average GC cycles on Mainnet averages around 0.5-2 per minute, depending on the node role. (metrics link)
Suppose it takes 100ms on average to process a message. Then the memory impact would be upper-bounded by the proportion of messages we receive within 100ms of a GC cycle. If we assume the larger 2 GCs/min, then about 0.3% of the memory associated with pb.Message
allocations would be retained for an extra GC cycle as a result of the longer retention. Even if all the suggested 16GB RAM were used for pb.Message
s, we would incur only a maximum of 50MB additional cost by retaining the message reference as suggested. In practice, this should be much much lower (likely <10MB in practice)
I support the "GossipSub Message Forensic (GMF)" approach, articulating my views as follows.
non-negligible
overhead to message processing.I do not support the "Enforced Flow-level Signing Policy For All Messages", I support the idea of "GossipSub Message Forensic (GMF)" but some changes may be necessary.
First, I wanted to clarify two concepts or cryptographic services, since they will be used twice in my comment:
These definitions may have social/legal aspects so we can assume for simplicity that we are not attributing messages to Bob as a party, but we are attributing messages to a party controlling a private key corresponding to some public key shared with the public. Non-repudiation is a stronger concept than authentication, and it prevents Bob from denying being the origin of some message. I believe @AlexHentschel has mentioned the same concepts and called them attributability (for authentication) and provability (for non-repudiation). As an example in cryptographic primitives, signature schemes offer both properties while message authentication code (MAC) between 2 parties only offers authentication. The purpose of this FLIP is to provide protocol-level non-repudiation, since authentication seems to be already implemented on the network layer.
I do not support "Enforced Flow-level Signing Policy For All Messages" because:
While I support "GossipSub Message Forensic (GMF)", I wanted to clarify a few points, in particular about our current implementation using libp2p:
I didn't look at PubSub in details, it seems that the networking key (not libp2p static key) is used to sign original payloads (which is great news) but we still need to confirm it.
If we want to implement GMF with libp2p we would need to make some changes to libp2p and its use of Noise for 1-1 (maybe through a fork). I will stop my reply here and we can get into ideas on how to update libp2p later.
thanks @tarakby for your detailed comment. Lets try to use the established terminology of authentication and non-repudiation going forward. Thanks Tarak for clearly explaining the concepts.
Hi @yhassanzadeh13 - this FLIP is not reflected on FLIP project tracker. Did you follow the process outlined in https://github.com/onflow/flips? Specifically please remember to do the following without which the FLIP won't get visibility on the project tracker-
Create an issue by using one of the FLIP issue templates based on the type of the FLIP - application, governance, cadence or protocol. The title of the issue should be the title of your FLIP, e.g., "Dynamic Inclusion fees". Submit the issue. Note the issue number that gets assigned. Then, Create your FLIP as a pull request to this repository (onflow/flips). Use the issue number generated in step 2 as the FLIP number. And mention the FLIP issue by copying the GitHub URL or the issue in the comment section.
Thank you!
Hi @yhassanzadeh13 - following up on my message above.
This FLIP is not reflected on FLIP project tracker. Can you pls follow the process outlined in https://github.com/onflow/flips? Specifically the following without which the FLIP won't get visibility on the project tracker-
@KshitijChaudhary666 Yahya has moved on to a new professional opportunity. This Flip is currently iceboxed (at least from the perspective of Flow foundation resourcing it - which does not preclude community contributors from picking it up, though they would need to align tech details with us). In summary it is up to us now to do with this flip whatever we do with flips currently on ice (please let me know).
I am wondering what state we should be targeting for this PR. There is a lot of important and valuable discussion here in the PR, which might still lead to changes in the proposed FLIP.
hey @AlexHentschel - we will not close the PR.
Just to be clear, The issue to track this FLIP is: https://github.com/onflow/flips/issues/259 The FLIP ID is the same as the issue ID (as per the new process): 259. This PR can remain open till there is agreement on the FLIP.
https://github.com/onflow/flips/issues/259
Summary
This FLIP discusses and compares two potential solutions for the Message Forensic (MF) system in the Flow protocol — a system that identifies and attributes protocol violations to the original malicious sender. The two solutions under consideration are: (1) GossipSub Message Forensic (GMF), and (2) Enforced Flow-level Signing Policy For All Messages. We delve into both, listing their pros and cons, to determine which would be more feasible given the considerations of ease of implementation, performance efficiency, and security guarantees.
Our analysis finds the "Enforced Flow-level Signing Policy For All Messages" to be the more promising option, offering a generalized solution that doesn’t hinge on the protocol utilized to send the message, steering clear of the complexities tied to maintaining GossipSub envelopes and dodging the necessity of duplicating GossipSub router’s signature verification procedure at the engine level. Furthermore, it meshes well with the Flow protocol’s existing state.
Review Guide
This FLIP is presented as a Pull Request (PR) in the
flow-go
repository. We welcome reviewers to express their opinions and share feedback directly on the PR page, aiming for a structured and productive discussion. To aid this, please adhere to one of the following response frameworks: