Open lrettig opened 2 years ago
It does not explain how version information should be encoded into data structures.
Why not also address this in a section of this smip once the how is figured out?
Regarding genesis
- what we want to avoid is data from another network which pretends to be a different network, for example another mainnet
that is using the same genesis tome but has different genesis net config. To clarify, the genesis net config should include all genesis accounts and network params - all the immutable params used by a node. Therefore, it is better to hash the whole genesis net config data and not just the genesis start time, as the goal is to basically exclude any fork which changes some of network config but keeps the network id and genesis time. For example, say I fork a mainnet, change any genesis config param such as genesis accounts, and launch it. Blocks and messages from nodes on the fork would be accepted by nodes in the original mainnet. Genesis time is just one of these params but is insufficient to mitigate situations / attack such as this.
The performance penalty should be negotiable as each node only has to hash the genesis data once - on first node session, keep this hash locally and use it to compare the genesis param in blocks, txs, messages and to add it to any block, tx or message it creates and shares.
Regarding the p2p protocol versioning section - why not use exactly the same versioning scheme proposed here? It seems satisfactory and sufficient. A p2p protocol change can just cause an update to the 'protocol' field. In other words, the p2p protocol is part of the overall protocol
and does not need to be identified as just a p2p protocol. If we want to support an update to p2p protocol then maybe it is better to add a p2p protocol
part to the version so it has 4 parts and so we only have one concept of 'version' instead of two?
It does not explain how version information should be encoded into data structures.
Why not also address this in a section of this smip once the how is figured out?
This is a lie, actually I did end up addressing this. Will update.
it is better to hash the whole genesis net config data and not just the genesis start time, as the goal is to basically exclude any fork which changes some of network config but keeps the network id and genesis time
I disagree. Using a hash of genesis time alone is sufficient. The goal here is to prevent conflict, by default, in the case of distinct networks. There is effectively zero likelihood of two networks coincidentally having exactly the same genesis time.
Network A could "fork" (i.e., copy) network B's genesis time in order to have an identical message/hash format, but that's a form of attack, and adding more params to the hash couldn't prevent this attack. An attacker could always just modify the hash function.
Regarding the p2p protocol versioning section - why not use exactly the same versioning scheme proposed here?
Because the p2p protocol and the protocol used to generate the canonical mesh are two totally different protocols. They needn't be coupled. In theory, you could use a totally different p2p protocol to construct the same mesh, or use the same p2p protocol to construct a totally different mesh (by, e.g., changing the consensus rules). If the p2p protocol changes, this should have no direct impact on the mesh protocol: it doesn't render blocks, transactions, hare messages, etc. obsolete. (In practice, a client running version n+1 of the p2p protocol could negotiate to communicate with an older client using version n of the p2p protocol--and from the perspective of the mesh, nothing would change.) And the opposite is also true: if we change the mesh protocol this doesn't necessarily mean we need to make any changes to the p2p protocol.
To put things in the context of Ethereum, it currently runs using devp2p but could also run over libp2p (as, indeed, eth2 does!).
I disagree. Using a hash of genesis time alone is sufficient. The goal here is to prevent conflict, by default, in the case of distinct networks. There is effectively zero likelihood of two networks coincidentally having exactly the same genesis time.
I disagree with your disagreement or I guess we agree to disagree on this point :-)
I believe that the goal of this hash should be defined as follows: any two honest nodes that use the same hash, are ensured that they both use exactly the same network and genesis configs. This should be the goal of this hash - to give these honest nodes this guarantee. If we follow your logic then we don't need the hash at all as at can be faked by dishonest nodes. It makes no sense to define this hash as network config that honest nodes must agree on and exclude from it most config params.
Another way to think about it - I think it is important to address the case where another network decided to run with a config that has exactly the same genesis time as the first network, but different genesis params such accounts. e.g. we need to do what we can to protect the first network against blocks, messages and txs from the other network and vice versa. The more we do to protect these networks from each other the better, especially since this hash needs to be only computed once per node instance.
So, there is no zero-likelihood of 2 networks having the exact same network time like your ague above because the 2nd network just copied the first network confifg file and changed few things in it and we want to protect the fist network as much as we can reasonably can from the 2nd one.
Another way to think about this - I don't think that the goal of this has is just protection from incidental use. We want to provide strong guarantees wherever we can that intentional dishonest behavior of bad actors is explicit in code changes. If you operate a rough network and you just copy the genesis hash of an honest 1st network and use different network config file, then we can't do much about bad messages, blocks and txs on the 1st network originated on 2nd network, besides rely on the honest majority of nodes in the 1st network.
This proposal has been updated to factor in the most recent R&D call on this topic. It's still open for comment. CC @noamnelke @tal-m @iddo333
This proposal has been updated per today's R&D conversation, is complete, and is still open for comment.
New hash IDs need to be gossiped and shared via P2P messages. A node needs to be able to request the set of all known IDs, or recent IDs, from peers, in order to be able to verify signatures.
I remember we discussed this, but don't remember why... To be able to process a message with a certain version, the node must possess the code for that version, meaning that it's aware, at the hard-coded code level, of all previous versions. Why would we need to be able to validate the signature/hash of messages that we can't process? I remember Tal saying something about wanting to gossip those messages if we determine that the majority supports their version - but this doesn't make sense to me. We don't blindly gossip messages without validating them and we want to blacklist neighbors that send us invalid messages, so there's some accountability.
What if we want the protocol to support a range of versions, e.g., "the past 5 VM versions" in transactions, rather than just the current one? Can this be supported using hash-and-sign? A: not supported in hash-based scheme, and not needed.
Even though the ID itself is a hash, there's still a concept of "X previous versions" since each hash is a hash of the previous version concatenated with the new extradata
. "Not needed" is a different story - we don't need to implement this right away, but remembering/documenting that it's possible, if needed in the future, is important IMO.
Unclear what happens if an ATX has a bad
PROTOCOLID
version.
Saying "unclear" sounds like we can find the answer to this and just haven't done this yet. I would say something like: "This value is for future use - to give us flexibility when implementing upgrades - so, by definition, we don't know if we'll need it or what for. For this reason, it's impossible to say what the required behavior will be."
at least 20 bytes, no more than 32
When I hear this, I actually hear "20 bytes" - so let's just put that in the SMIP.
We don't have to solve this for this SMIP, but I think it should at least appear as an open question for future research: How do we prevent re-use of Smesher IDs across networks that share a genesis? We want to allow a smesher to choose between two forks without allowing them to use their proof-of-space data for both forks at the same time. The solution must allow smeshers that choose either side of the fork to do no additional initialization.
How do we prevent re-use of Smesher IDs across networks that share a genesis? We want to allow a smesher to choose between two forks without allowing them to use their proof-of-space data for both forks at the same time. The solution must allow smeshers that choose either side of the fork to do no additional initialization.
The only solution I can think of is fraud proofs: if the same smesher submits an ATX on two forks in the same epoch, the ATX from one fork could be submitted to the other fork as a fraud proof that would invalidate the smesher's ID on that fork. Of course, the two forks would have to have different VMID
s and/or PROTOCOLID
s, and each fork would have to know the other's IDs to verify the fraud proofs.
New hash IDs need to be gossiped and shared via P2P messages. A node needs to be able to request the set of all known IDs, or recent IDs, from peers, in order to be able to verify signatures.
I remember we discussed this, but don't remember why... To be able to process a message with a certain version, the node must possess the code for that version, meaning that it's aware, at the hard-coded code level, of all previous versions. Why would we need to be able to validate the signature/hash of messages that we can't process? I remember Tal saying something about wanting to gossip those messages if we determine that the majority supports their version - but this doesn't make sense to me. We don't blindly gossip messages without validating them and we want to blacklist neighbors that send us invalid messages, so there's some accountability.
Transactions with the wrong VMID
may be discarded, not gossiped, etc. (and we may even be able to ban peers that send them to us), but ATXs with the wrong VMID
need to be retained and their signatures still need to be validated. It's true that a node would know "at the code level" about older IDs, but it would not know about newer ones. That's why I suggested we'll need to gossip them, and to store them in memory in a table.
Why does the node need to get new ids from network, what it will do with them?
I remember that Tal was advocating to include some ID in ATX, purely for telemetry, it would make sense, but gossiping ids separately just doesn't. We cannot accept any random data from the network, data needs to be verifiably useful, otherwise, this will be just another ddos vector.
@lrettig
The only solution I can think of is fraud proofs
I think there's a way to solve some ickiness around fraud proofs if we do it in advance: we introduce a RetireID
message that a smesher can sign to make their ID invalid starting at some epoch. The forked network can require smeshers converting from the old network to publish this message (referencing the old network) to be eligible on the forked one (new smeshers on the forked network can include a reference to the fork in their PoST data).
This is the same principle as a fraud proof (it invalidates the smesher) but it's tied to an epoch. With a fraud proof, there's a question about when has the fraud occurred and when does the ID stop being valid.
ATXs with the wrong VMID need to be retained and their signatures still need to be validated
ATXs explicitly include the PROTOCOLID
, so validating them doesn't require any previous knowledge, even for versions the node doesn't know about (assuming that ATXs with an unknown PROTOCOLID
should be considered valid).
ATXs explicitly include the
PROTOCOLID
, so validating them doesn't require any previous knowledge, even for versions the node doesn't know about
Good point. A node receiving an ATX with a PROTOCOLID
it doesn't recognize wouldn't know whether that's because the PROTOCOLID
is too new (i.e., from a later app update that the node hasn't installed yet), or invalid, or some other reason, but it wouldn't matter as long as they retain all ATXs.
@dshulyak @noamnelke how do you guys feel about this part?
New hash IDs need to be gossiped and shared via P2P messages. A node needs to be able to request the set of all known IDs, or recent IDs, from peers, in order to be able to verify signatures.
Would you prefer that ATXs include them? The current design is that the ATX just includes the PROTOCOLID
which is:
PROTOCOLID := hash(VMID || PROTOCOLID_prev || protocol-extradata)
If we don't want to gossip/request the VMID
separately, it would have to be separately and explicitly included in the ATX.
Posting the feedback I sent to @lrettig here:
I think this is more complex than it needs to be and wish we could avoid this. Specifically, it seems like all of this could be implemented the first time we upgrade something rather than in advance. This will both allow us to defer some complexity and, far more importantly, will allow us to pick the best and simplest solution, given the actual upgrade(s) we do (rather than trying to predict what we’ll need/do).
Moving the conversation here. @iddo333 pointed out the following as reason to implement this pre-genesis:
In theory, it’s debatable whether the signaling is needed at genesis (because we don’t have any update to perform yet, and an update implementation+voting are inseparable/atomic), but both Tal and I think that it’s required at genesis, for different reasons. Tal thinks that it’s needed so that honest miners will shut down (so that they won’t be used by an adversary somehow) if an update (that they didn’t vote for) succeeded. The other reason for signaling at genesis is that it’s crucial that the signaling would already be operational and verified on testnet, otherwise it’d be shaky to implement it when actually needed.
But I disagree. If we need old miners to stop mining on the new chain, a better way to do this is make their mining invalid (by making a small change to the ATX format). This does not rely on anything they do locally.
We're also NOT going to implement an upgrade voting mechanism before genesis. For the first year at least we're going to rely on social consensus for updates (in less fancy terms: the Spacemesh company will issue an update and ask the community to adopt it).
Agree with @noamnelke that we won't have an upgrade voting mechanism, or signaling, before genesis. These can be added later as needed. Regarding whether we need any of this at genesis, even if we decide to hold off, we still need to figure out what to do with our hash functions. From https://github.com/spacemeshos/pm/issues/117:
@noamnelke do you think we can get away with just saying that the initial "version tuple" included in the hash function is nil?
Also, realistically we will probably need to upgrade things shortly after genesis, and we want to avoid a situation where such an upgrade would be delayed because the code wasn't ready, or tested. So this still feels high priority, even if it isn't strictly genesis critical.
do you think we can get away with just saying that the initial "version tuple" included in the hash function is nil?
Yes.
Also, realistically we will probably need to upgrade things shortly after genesis, and we want to avoid a situation where such an upgrade would be delayed because the code wasn't ready, or tested. So this still feels high priority, even if it isn't strictly genesis critical.
Yeah, we'll definitely need to upgrade the node shortly after genesis and we'll most likely need to upgrade the protocol (though hopefully not that quickly). Will such upgrade require having a version encoded into messages? Even if we need to change messages, I'm not convinced that the version needs to be embedded in them.
I strongly prefer to not add this mechanism (ever) unless it's strictly needed. We'll know best if it's avoidable when we have a specific upgrade in mind. If we end up needing to add this mechanism we will be much better positioned to implement it in a minimal way and test the specific case that we end up needing, rather than implement something generic now that would have to work in a variety of use-cases and then think of all the possible cases that need to be tested.
Reviewing the motivation from above:
Establishing a terminology and a common, global way to refer to different networks/chains, while preventing version conflicts across chains/forks
I think we do need a way to make sure a node that synced, say, a testnet, doesn't connect to and try to sync mainnet. This suggests to me that, at genesis, at the very least, we do need some sort of chainID or genesisID. (Alternatively, we could try to address this at the P2P layer, but I think that would be a mistake, it could be spoofed, etc.)
Make it as easy as possible for downstream tooling (e.g., block explorers, deployment tools, wallets) to specify and connect to multiple Spacemesh-compatible networks/chains, and to differentiate data structures (e.g., blocks) from these different chains
Same--we expect our tooling will interface with both testnet and mainnet at genesis.
Cross-chain replay protection; clean network bifurcation (in case of a contentious hard fork)
Should not be an issue at genesis, but we will need to add this to prevent certain classes of attack
Make sure that transactions (targeting an older protocol/VM) aren't included in blocks where they would be invalid, or would behave otherwise than anticipated by the user or application that created them
Definitely not needed at genesis, and not until we perform a VM upgrade with breaking changes (e.g., changing gas prices). Even then this feels like a nice to have, as an old tx landing in a block post-hard fork is a corner case. This is mostly a UX nicety.
Have an on-chain record of which protocol version each block producer is running, and a way to zero the voting weight of those running old versions of the protocol (as an incentive to upgrade)
Definitely not needed at genesis, as discussed. We can add it later when we need it.
Per my previous comment here, I think we'll need GENESISID
by genesis to support multiple networks, including testnets, prevent data contamination/replay attacks, etc. We could leave out VMID
and PROTOCOLID
for now, and add them later, or we could implement them alongside GENESISID
and set them to null for now. I'm not sure it's that much easier to implement just GENESISID
than it would be to implement the full scheme now. Most of the epic laid out in https://github.com/spacemeshos/pm/issues/117 still applies.
Notes from today's chat with @noamnelke:
GENESISID
for genesis. This is relatively straightforward since it never changes, and we can always assume someone who broadcasts messages with a bad GENESISID
is acting ByzantinePROTOCOLID
/VMID
(and we can do this later, post-genesisGENESISID
in post data (which ATXs, blocks, ballots, etc. can all be tied back to) and transactions (for cross-chain replay protection; txs cannot be traced directly back to post data); ATXs can be traced back to the golden ATX (which should differ based on GENESISID
)GENESISID
to prevent cross-chain replay attacks; from the perspective of the node and protocol, all that matters is whether verify()
returns true; the default template will check that the signature signs a hash of the raw tx and GENESISID
GENESISID
) or hare certificate (same); therefore, theoretically you don't need to include GENESISID
in block hash but we probably will anywayGENESISID
in the P2P handshake (but an adversary could still broadcast bad messages post-handshake so it doesn't obviate the need for any of the other checks)GENESISID
doesn't match?
GENESISID
. in any case blocks are not gossipped so it's a moot point.GENESISID
GENESISID
in tx hash + sign. will handle nonce question in #104
Overview
It's desirable that we be able to attach version information metadata to certain data structures (such as blocks, transactions, ballots, proposals, and eligibility proofs), explicitly or implicitly (via, e.g., including a version ID in the preimage of a hash that's signed). This metadata will allow nodes to verify whether or not a given data structure is valid in a given context, and it will provide information about which version of the protocol other block producers are running.
Scope
The scope of this SMIP includes the design of the versioning system: how versions should be set and interpreted. It also includes how version information should be encoded into data structures. Versioning of other, independent protocols (P2P, Hare, beacons, etc.) is explicitly out of scope.
Goals and motivation
High-level design
We introduce a hierarchical three-part versioning system, akin to semantic versioning (semver) but with some important distinctions. Rather than adopting semver's
MAJOR.MINOR.PATCH
, we adopt the following convention:GENESISID.VMID.PROTOCOLID
.Prior art
Ethereum
Ethereum introduced a
CHAIN_ID
to transaction hashing and signing in EIP-155 for the purpose of cross-chain replay protection. The ID is an integer that is manually configured per network. A list of IDs is published at https://chainlist.org/ and maintained at https://github.com/ethereum-lists/chains/tree/master/_data/chains. We are not aware of versioning of any other data structures in Ethereum, including transactions, blocks, or smart contracts.Algorand
Algorand implements a genesis hash which is explicitly included in every transaction.
Spacemesh
26
Proposed design
Definition
Each element of the version triad is a 20-byte value defined as follows:
GENESISID
: contains a unique value for each chain that shares a common genesis that's known at genesis and never changes. Included in all data structures (hashed into signatures), including blocks and transactions. In ballots, it's combined with the state hash. Data structures with a differentGENESISID
should not be gossiped. If a data structure with a differentGENESISID
is received, it may be assumed that the node that gossiped the message is acting Byzantine.VMID
: refers to the version of the virtual machine used to process transactions. This should change infrequently, and only when a change that breaks backwards compatibility is made to the VM. Included in transactions (implicitly, via signature) and ATXs (implicitly, as part ofPROTOCOLID
); included implicitly in ballot via its reference to an ATX and in proposal via its reference to ballot. Transactions with the wrongVMID
should be discarded: they should be dropped from the mempool after an upgrade, they should not be gossiped, and they should be dropped if received. ATXs with the wrongVMID
should be kept to establish eligibility but their voting weight should be set to zero.PROTOCOLID
: a general-purpose, catch-all version that may be updated any time a significant protocol change is made. This is expected to change relatively often, including as part of most or all network upgrade hard forks. Included only in ATXs, where it's included explicitly. In general, it's intended to be informational, but in theory it could be used in future to, e.g., incentivize block producers to upgrade to the latest protocol version by having later versions vote against proposals and ballots produced by miners running older code.Construction
Here's a proposed scheme for calculating these values for a live network:
GENESISID
: a hash of the network genesis time combined with a config field calledgenesis-extradata
(that is set just prior to genesis and never changes), i.e.:GENESISID := hash(genesis-time || genesis-extradata)
VMID
: a hash of the previousVMID
, the currentGENESISID
, and an arbitraryvm-extradata
value. Every time a significant change is made to the VM, which would cause transactions to be interpreted differently,vm-extradata
should be updated, causingVMID
to be recalculated. In practice, the latest git commit hash should be sufficient. For illustrative purposes, a change in gas price tables for existing operations should trigger aVMID
change, whereas adding a new opcode to the VM is not considered a breaking change and would not require changingVMID
.VMID := hash(GENESISID || VMID_prev || vm-extradata)
PROTOCOLID
: a hash of the previousPROTOCOLID
, the currentVMID
, and an arbitraryprotocol-extradata
value. Updated every time a protocol change is made. Informational.PROTOCOLID := hash(VMID || PROTOCOLID_prev || protocol-extradata)
Implementation plan
Version values may be implicitly or explicitly included in data structures (see notes above). To include a value implicitly, the object does not include an explicit
GENESISID
orVMID
. Rather, the value is included implicitly via hash-and-sign: the data structure to be signed is first prepended by the version blob, then the resulting hash is signed and the signature is appended to the data structure (i.e., the ID is included in the hash preimage). This can be accomplished in a straightforward fashion by wrapping all hash function calls.In addition to changing how the node calculates object hashes, in order to interpret these hash ID values, nodes will require two additional pieces of infrastructure:
Questions
PROTOCOLID
be reset to 0 every timeVMID
is incremented (as in semver)? A: doesn't matter in hash-based schemePROTOCOLID
included? What's the effect when it's changed? A: only in ATXs. Unclear what happens if an ATX has a badPROTOCOLID
version.GENESISID
only. IDs should not be portable across networks that don't share a genesis.Dependencies and interactions
VMID
s, including when an upgrade happens (might require special handling in case of upgrades)PROTOCOLID
Stakeholders and reviewers
Testing and performance
While performance implications are expected to be minor, benchmarking of the proposed implementation should be performed to ensure that prepending version information to serialized data structures before hashing does not materially impact performance.
A thorough suite of unit tests should be specified to ensure that messages containing valid versions are accepted, and those containing invalid versions or no version are dropped (as are the peers that gossiped them).