Clients Update (discussion and ideas, not a spec)

spacemeshos / SMIPS

Spacemesh Improvement Proposals

https://spacemesh.io

Creative Commons Zero v1.0 Universal

7 stars 1 forks source link

Clients Update (discussion and ideas, not a spec) #31

Open avive opened 4 years ago

avive commented 4 years ago

Clients Update (Proposal 1)

Requirements

Provide a solution for updating clients on a sm network when new releases include backward-compatible and/or non-backward compatible full-node component changes for a specific spacemesh network.
Provide a sensible way for users of go-sm and smapp to update their software promptly when a new version of go-sm and/or smapp are released for a spacemesh network.

Design 1

Overview

Auto updates for go-sm terminal users, one-click update for smapp users.
p2p consensus among full nodes on starting to use an updated component version.
Users may opt-out from updating - Opted-out users will need to manually update their software when a new release is available.

Tasks

Enable automatic updates to users who run go-sm without smapp when a new release is available.
Enable smapp users to easily update smapp when they wish to do so and a new release is release.
Enable go-sm nodes to reach consensus before starting to use an update to a consensus protocol or internal component.

Design Overview - Task 1 and 2

We will need to deploy and maintain a web service which provides the latest release of go-sm and/or smapp per a spacemesh network id. When a new release of smapp or go-sm is available for a given network, we'll need to update the web service results to indicate that release.
Go-sm and smapp will have code to periodically get data from the web service to determine if an update is available.
When running smapp (using a specific network id) and a new version of go-sm or smapp is available for that network, smapp needs to prompt the user if an update is available as restarting a gui app automatically on users is problematic (user might be interacting with smap). Once user decides to update smapp should stop the current node, update go-sm and (itself if needed) and restart. Note that when running a go-sm node via smapp, smapp should provide a cli flag to the node NOT to auto update by itself silently and automatically because smapp drives the update process and a user prompt should be involved.
When running go-sm without smapp, go-sm should periodically check for a new version availability for the network id it is running on. When this update is available (and the user didn't opt-out from auto updates), go-sm should update itself and restart. Note that in this scenario, the new started node should be started with exactly the same runtime flags that the previous version was started with to preserve user provided configuration. There might be issues around updating a CLI process started from the terminal. The update kills the current process and starts a new one. We need to figure out an elegant way to do this which fully supports Terminal execution.
We should provide a way for both smapp and go-sm users to opt-out from updates via config.

Design Overview - Task 3

Research should provide the algorithm for updates via consensus and we need to come up with a good design to facilitate this plus we'll need to add a notion of versioning to internal sm components.

Pros

TODO: Please add here pros over design 2.

Cons

Complex to implement and get right. Failure in agreement to update consensus protocol by consensus can lead to network death - this is a highly risky design.
Because we need to supported opted-out users, we are going to have go-sm full nodes who are running an older release on a sm network and we need to handle these in a good way.

Design 2

Users are responsible to update their go-sm when running directly via terminal or smapp when a new release is available. In this case they just stop go-sm when run in terminal, download a new release and start it with the same cli flags. Smapp users just quit it and run the smapp installer to update both smapp and potentially a managed go-sm in a new smapp release.
Non-backward compatible updates: When a non-backward compatible change to a component is introduced to go-sm, the release should specify from which layer this new component should be used by nodes and to stop communicating with nodes that did not update to this release when this layer is reached or to drop their component-related messages. I think this is how eth updates are designed.

Pros

Much simpler to design, implement test and maintain.
We have an example of how to this in a good way from other projects.
Tal: Please add here.

Cons

???? It is not clear to me that this design direction has any meaningful cons to meet the requirements.

Tal: please add any issue with meeting the smip goal with this design alternative.

Issues / Challenges / Considerations

Since we all agree we need to allow users to opt-out from auto updates (if we chose design option 1), we need to take care of the case where some full odes on a network are running an old version of the node that doesn't have updated components and consensus mechanism code.

Some nodes will keep running an older release and some nodes will run a release with updated components which are not backward compatible with the older release.

Regardless of the design above for the update mechanics, we need a good design handle this situation in go-sm code. This is an issue whether we chose to have nodes vote on updating to a new version of a component or not because some nodes will just won't have the new version that was agreed upon to be used.

One way to solve this is to have nodes which updated to a new release which is not backward compatible with an older release to drop all p2p traffic from older nodes but this can lead to a network fork. Maybe such a fork is unavoidable in these settings.

tal-m commented 4 years ago

The Goal

Let's start with the goal of opt-out automatic update: In the initial period after mainnet launch (and definitely during the testnet), we expect there will be fairly frequent updates --- both to add features and to fix bugs.

Especially in the case of bug fixes, it's important for the majority of the network to update quickly. Any update that requires manual intervention of a highly decentralized network (as we expect ours to be) is unlikely to be quick. This is why we want updates, by default, to be completely automated during this initial period.

Since we expect a significant proportion of the spacetime to be controlled by nodes running go-spacemesh directly, in order to achieve our goal we need go-spacemesh to auto-update without help from the smapp.

I think we need to consider separately three types of update mechanisms:

Smapp Updates

These are updates to the GUI. Updating the GUI requires user intervention (at least for restart), so it can't be done completely automatically. However, the GUI updates aren't criticial to the functioning of the network, so I think it's ok if these are delayed for some users. If a node update requires a GUI update (e.g., the new node communicates with the GUI over a different API), I think it's ok to have the GUI simply "turn off" and show a message that it can't continue until it's been restarted.

If the API hasn't changed, the Smapp should be able to handle the go-spacemesh node stopping and restarting in the background without having to restart the GUI (e.g., if go-spacemesh has been automatically updated). Good to have (but not critical) is a GUI notification that go-spacemesh was automatically updated (this should appear in any case in the go-spacemesh logs).

Node updates

There should be an "external" mechanism for updating a node that doesn't care about what the update actually does. This mechanism will check periodically for updates (e.g., by polling our web service), download the update if it exists, verify signatures, stop the old node and start the new one. This update mechanism is completely agnostic to the payload (i.e., it doesn't need to know anything about the spacemesh protocol or any go-spacemesh APIs except for the API that gracefully shuts down the node pending an update).

We do want to allow an opt-out mechanism for users who want to control their update process manually -- I think this can be a configuration file option.

Node updates may or may not involve updating the protocol. For example, if we fix a memory leak, this will require a node update but not a protocol update. For an update that does not change the protocol, there's nothing more to be done; the external update will be sufficient.

Spacemesh Protocol Updates

For node updates that do change the protocol, we need an "internal" update mechanism. This is the mechanism that will ensure that updates don't fork the network and that history is safely preserved across the update.

Here's an idea for a very basic version:

We define an "update notification" as a well-defined data structure containing:
- a new protocol id,
- a layer number $i$ at which nodes will start using this new protocol
- a minimum weight $W$ of updating nodes which is required to make the switch
- a "cutoff" layer number $j<<i$
The update-id is the hash of the update notification
Nodes that intend to update will publish the update-id in a special field of their blocks.
A node will perform the update at layer $i$ if the total weight of nodes announcing this update up to layer $j$ is at least $W$.

In order to perform a protocol update, we create a new go-spacemesh node version that internally contains the update and the code for the internal-update mechanism to that update. We then use the external mechanism to update to the new go-spacemesh node.

Sync Protocol Updates

Updates to the sync protocol don't have to be in consensus (although of course they can be). So we have a "softer" option of writing a new go-spacemesh node that negotiates the sync version with its neighbors, and uses the latest one they both support.

avive commented 4 years ago

@tal-m - we still need to handle the case that old nodes attempt to communicate with nodes which updated their protocol. I think that in that case there's no way around having the new nodes blocks all p2p comm with older nodes. We need to think what happens when for example a 1/3 of the network is running an old protocol (didn't update yet) and 2/3 running a new one and the 1/3 upgrading to the new protocol over time. Even if updates are automated, some nodes will opt-out for these so there's no way around handling this case. Correct?

tal-m commented 4 years ago

@avive Yes, blocking all communication is reasonable. I think our own code should also contain an automated "kill switch" as part of the internal update mechanism -- if a node sees that an update is imminent (i.e., a large enough majority has announced intent to update) and it hasn't updated itself, it should warn the user, and should automatically shut itself down when the update layer occurs.

Nodes that don't accept the majority update are effectively starting their own forked network --- we should force users to make this a conscious decision --- it shouldn't happen just because someone set their node to manual updates and then went on vacation.

noamnelke commented 4 years ago

Some comments on what @tal-m wrote (I agree with most of it):

The SMApp updates are not critical to the network and I think @avive should have "artistic freedom" to handle them as he sees fit. What you described makes a lot of sense, but I see it as a suggestion and nothing more.

While I think that having the ability to upgrade the majority of nodes would make our lives easier, I don't agree with you that it's critical, even in the beginning. I think that baking in this ability is a slippery slope towards centralization, and may already be far enough down the slope to make the protocol unappealing for some people. I'm an insider here, but if I wasn't - this feature would turn me off, even if for my own node I can opt out. Having this turned on by default gives the company unlimited power as they can basically change the code to anything they want, whenever they want - on a majority of the network.

I also disagree that the network will be highly decentralized in the early days, when being able to update quickly is important. Most participants will be people who trust us and are on our Discord / Telegram / mailing list / etc., and those people will likely be technologically above average and highly responsive. So most chances are that most of them will upgrade in a matter of hours of when we ask them to.

To be clear - I don't object to implementing an auto-update mechanism - only to making it opt-out. If a majority of users trust us enough to turn auto-updates on explicitly - this is their right and I have no objection to that.

I also think that it's not critical to implement. If it's easy - fine, but otherwise we can add this feature later.

Protocol updates are really the most interesting and critical part of this discussion. I think that what you suggested is reasonable for some kinds of updates, but I think we should bake in more aggressive options and also prepare for hard forks.

We may not always want to make an update depend on miner votes. We may want to push some protocol updates that take effect on some layer, regardless of miner opinion. In that case miners who don't update in time will stay behind on a fork of the network (technically, the new version is the fork, but that's a different discussion).

Even with 90% agreement - we still have to deal with miners who are left behind, and I don't think that it's that big a difference if the majority chose the new or old side of the fork. Correct me if I'm wrong.

In case we don't want an update to depend on adoption - there's no need to cast votes.

I actually think that it's not the end of the world if we start the network when it only supports those kinds of updates (forced, regardless of votes) and add support for miner voting later, if needed.

Supporting hard forks, on the other hand, is critical regardless of having this voting feature.

What you said about the sync is true, more generally I think there are several components that could be updated in a backwards compatible way and we should always strive to do this whenever at all possible. See segwit for an extreme example of a backwards compatible protocol update.

tal-m commented 4 years ago

@noamnelke I agree that the mechanism I described is more than halfway to centralized control. However, I think we will need this level of centralization in the beginning. The reason is that I expect we will have critical security / bugfix updates of the sort that will cause network failure if they're not adopted by a majority of nodes (this isn't a comment on our code quality -- this is consistent with the historical record for basically every new application).

If we leave updates to the whims of users, I think there's a high risk that we'll have network failure due to "indifferent" users (who haven't bothered to turn on auto-updates), rather than conscious decisions on the part of users to veto an update. This risk is much greater for us than for e.g., Bitcoin or Ethereum, precisely because we aim for much higher decentralization, and make the barrier to entry so low.

Of course, the long-term plan is to avoid having the security rely on a trusted entity. So once the network is sufficiently stable (e.g., sometime in the first year) I think it's a good idea to transition to an opt-in mechanism instead (we can do this by automatic update, of course :-) )

Protocol updates (at least those that would cause hard forks) have to depend on sufficient adoption. The reason is that our security depends on an honest majority of spacetime --- if the nodes that are updating control say, 10% of the spacetime, then our security assumption is clearly false. (Protocol updates that are backwards-compatible --- i.e., can coexist with some nodes running the previous version --- can use the external update mechanism without any voting)

On the other hand, there's no need to "support" hard forks. As I wrote above, we should include a "kill switch" that would prevent unintentional hard forks --- nodes should just shut down if a new protocol version is adopted, and they aren't running it.

avive commented 4 years ago

@noamnelke @tal-m - at this point we need to agree on what's critical for genesis in terms of requirements.

lrettig commented 4 years ago

Provide a sensible way for users of go-sm and smapp to update their software promptly when a new version of go-sm and/or smapp are released for a spacemesh network

Updating smapp and updating go-spacemesh are two completely different things in my mind, and we should approach them separately. I think we should discuss them in separate smips. I think I'm saying the same thing that Tal and Noam said, in different words. Updating smapp is much less important to the health of the network than updating the node software. In particular, updating smapp should be very, very straightforward since it's built using electron and javascript, which means it should support over the air (OTA) updates, which are seamless and quick: see https://www.electronjs.org/docs/tutorial/updates, and lots more on google

I think it's important to note that we have two distinct but related protocols here that we're talking about upgrading: the p2p/sync protocol, and the spacemesh protocol itself. These, again, are two different questions (although in practice they may be bundled together into a single node release).

If a node fails to upgrade the p2p/sync protocol, it should still be able to participate fully in the network. As Tal pointed out, clients should be able to negotiate a sync protocol and use the latest protocol both support. (In practice, some nodes may choose not to sync with nodes running older network code, or may gradually phase out support for older network protocols over time. This is fine.)

On the contrary, if a node fails to upgrade the spacemesh protocol before a hard fork, then that node is effectively running on a fork of the network, immediately as of the hard fork layer height. In particular, to Tal's point:

nodes should just shut down if a new protocol version is adopted, and they aren't running it.

I actually strongly disagree with this for reasons of governance. This implies that "a new protocol version" is authoritative, but there is no such thing (I think Noam will agree with me here :) - in particular, just because we (core team) release or sign an update, that shouldn't make it any more "special" or authoritative than any other release. Nodes that choose not to upgrade when a hard fork occurs should just continue on the original network. Choosing not to upgrade is a legitimate choice and a form of voting and we shouldn't do anything to take that away from node operators.

we need to agree on what's critical for genesis in terms of requirements

I don't think we need miner voting for now. I see this as a nice to have. If we do have it, I think it should just be a signal that's interpreted at the social layer (as in Bitcoin and Ethereum).

Ethereum has many thousands of node operators all around the world that speak many different languages and operate in many different timezones, and the network has successfully coordinated emergency hard forks in as little as 36 hrs. It's not that hard and I don't see a reason that we need to do something differently, at least not an urgent reason to do so right now when we're much smaller. Put everyone in a single chat group/email list/whatever. Those who care, and those who are paying attention, will upgrade on time. Those who aren't/don't won't, and will end up on another fork, and it'll be a headache for them to roll back and update, but they'll do it if they care to.

(Okay, yes, we are targeting less technical users on average. For these users, a simple, all-in-one auto-update mechanism might make sense. Still, the idea irks me for the same reasons Noam pointed out. I really, really, really don't want to put in place anything that takes away the agency of a node operator in deciding which fork to follow as of a hard fork protocol upgrade.)

The technical question of how to get the node to update itself automatically is an interesting one. Openethereum does it, the code is here: https://github.com/openethereum/openethereum/tree/master/updater. Other blockchain nodes probably also do it. We can take a closer look at what they do. There's no need to reinvent the wheel here.

tal-m commented 4 years ago

On the contrary, if a node fails to upgrade the spacemesh protocol before a hard fork, then that node is effectively running on a fork of the network, immediately as of the hard fork layer height. In particular, to Tal's point:

nodes should just shut down if a new protocol version is adopted, and they aren't running it.

I actually strongly disagree with this for reasons of governance. This implies that "a new protocol version" is authoritative, but there is no such thing (I think Noam will agree with me here :) - in particular, just because we (core team) release or sign an update, that shouldn't make it any more "special" or authoritative than any other release. Nodes that choose not to upgrade when a hard fork occurs should just continue on the original network. Choosing not to upgrade is a legitimate choice and a form of voting and we shouldn't do anything to take that away from node operators.

I think you misunderstood my suggestion. The new protocol version becomes authoritative only after voting on the mesh, not just because you downloaded it. The order of events is:

Update is published at layer $k$ with hard-coded parameters $j>>k$ and $i>>j$.
Auto-updating Nodes download and install protocol update using "external" update mechanism.
The new code announces on the mesh the intent to update at layer $i$ if a quorum is reached before layer $j$.
If a quorum is not reached by layer $j$ (i.e., the total weight of nodes that announced the intent to update is not a large enough majority), no update occurs --- all nodes continue running the old protocol version.
If a quorum is reached by layer $j$, then at layer $i$:
- All nodes running the new code will update their protocol version
- All nodes running our old code will shut down.

The "authority" that causes a node to shutdown if not updated isn't our signature, it's a consensus on the mesh that a large majority of the nodes have updated. The reason to do this is to prevent forks "by inaction". In order to join a forked network, a user will have to manually download a different version of the software (or compile their own), rather than just do nothing.

we need to agree on what's critical for genesis in terms of requirements

I don't think we need miner voting for now. I see this as a nice to have. If we do have it, I think it should just be a signal that's interpreted at the social layer (as in Bitcoin and Ethereum).

The miner voting feature is required to implement the fork-protection I described above. I think it's a lot more important than "nice-to-have" --- launching without it increases our risk of network failure --- but perhaps it's not critical on day 1.

Ethereum has many thousands of node operators all around the world that speak many different languages and operate in many different timezones, and the network has successfully coordinated emergency hard forks in as little as 36 hrs. It's not that hard and I don't see a reason that we need to do something differently, at least not an urgent reason to do so right now when we're much smaller. Put everyone in a single chat group/email list/whatever. Those who care, and those who are paying attention, will upgrade on time. Those who aren't/don't won't, and will end up on another fork, and it'll be a headache for them to roll back and update, but they'll do it if they care to.

(Okay, yes, we are targeting less technical users on average. For these users, a simple, all-in-one auto-update mechanism might make sense. Still, the idea irks me for the same reasons Noam pointed out. I really, really, really don't want to put in place anything that takes away the agency of a node operator in deciding which fork to follow as of a hard fork protocol upgrade.)

Our goals are not just less technical users, but also a much higher degree of decentralization -- meaning it's much harder to "get everyone on a chat".

We're not taking away anyone's agency. We're just making sure the defaults promote network health and, in the initial period, fast updates.

The technical question of how to get the node to update itself automatically is an interesting one. Openethereum does it, the code is here: https://github.com/openethereum/openethereum/tree/master/updater. Other blockchain nodes probably also do it. We can take a closer look at what they do. There's no need to reinvent the wheel here.

I agree --- if there's reasonable-quality code that handles this, there's no need to write our own.

lrettig commented 4 years ago

I think you misunderstood my suggestion. The new protocol version becomes authoritative only after voting on the mesh, not just because you downloaded it.

The "authority" that causes a node to shutdown if not updated isn't our signature, it's a consensus on the mesh that a large majority of the nodes have updated.

Everything you propose sounds technically sound, but I don't think it's socially sound. It doesn't matter how many nodes, or what percentage of the network by hashpower/disk space/whatever, votes in support of a fork/upgrade. Defaults are powerful, and to change the default from "do nothing" to "just do whatever most other people are doing" sets a very dangerous precedent. It's not a perfect metaphor, but it's a bit like saying, if you choose not to vote for president, then your vote automatically counts for the incumbent (or for whomever wins the most votes). This benefits the majority and established powers at the expense of the minority.

The choice to do nothing and remain on the existing, pre-upgrade network is a perfectly legitimate choice on the part of a node operator. To take this away is to take away agency from the node operator.

Ideally we'd remove the default option entirely like Ethereum did with the DAO hard fork, where node operators were required to add a command line flag to choose a fork branch. In practice this is quite hard, and I think the second best thing we can do is for the default to always be "do nothing."

The reason to do this is to prevent forks "by inaction". In order to join a forked network, a user will have to manually download a different version of the software (or compile their own), rather than just do nothing.

The forked network is the network that results from the upgrade, not the other way around. The nodes that don't upgrade are still on the old network. As you say, to join the "upgrade fork," nodes should need to download the new software or otherwise opt in to the upgrade.

Hard forks are hard to coordinate by design. This is the thing that differentiates blockchain from a centralized topology. I concede that, in the very earliest phases of the network, it might be nice from a network stability perspective to make upgrades easier to coordinate, but it's a slippery slope and a dangerous precedent and I'm not sure it's worth the cost.

tal-m commented 4 years ago

Everything you propose sounds technically sound, but I don't think it's socially sound. It doesn't matter how many nodes, or what percentage of the network by hashpower/disk space/whatever, votes in support of a fork/upgrade. Defaults are powerful, and to change the default from "do nothing" to "just do whatever most other people are doing" sets a very dangerous precedent. It's not a perfect metaphor, but it's a bit like saying, if you choose not to vote for president, then your vote automatically counts for the incumbent (or for whomever wins the most votes). This benefits the majority and established powers at the expense of the minority. Yes, the defaults I propose for the initial period will give spacemesh developers a lot of power --- in effect, the system will have a "do what spacemesh developers say unless there's a sufficiently strong conscious veto". This is the purpose of these defaults!

If we're as successful in what we're trying to do, our decentralization advantages --- the very low barrier to entry and emphasis on supporting node operators that are not hard-core techies --- we will have a very significant fraction of the mining weight be controlled by home users. Hopefully, a far larger fraction than existing systems such as Bitcoin and Ethereum.

I'm worried that in this situation, the update mechanisms that work for Bitcoin and Ethereum ("let's just ask people to update on discord") will not work for a significant fraction of users. This is especially critical in the initial period, where we know for sure that we're going to do protocol updates (e.g., feature additions such as full smart-contract support).

The choice to do nothing and remain on the existing, pre-upgrade network is a perfectly legitimate choice on the part of a node operator. To take this away is to take away agency from the node operator.

I agree that it's a legitimate choice. I strongly disagree that it's a legitimate default. The difference between our suggestions is what should happen when a user does "nothing" (e.g., is on vacation when the update occurs). My strategy boils down to "If I'm on vacation, and 2/3 of the network switch to a new protocol version, I want my node to switch to the majority fork". Your strategy is "If I'm on vacation, and 2/3 of the network switch to a new protocol version, I want my node to switch to the minority fork"

Note that the semantics of what is a "fork" and what is the "original" don't really matter here; the thing that matters is where the majority of the weight is --- not least because the honest majority assumption for a network with 1/3 of the weight is pretty clearly false. (Note that compared to PoW blockchains, a fork is a bigger security problem, since the resources can potentially be reused on both sides of the fork --- there are mitigations, but it's not a simple as the PoW case).

The reason that the "switch to minority" isn't a good default is that if a significant fraction of the users are "slow responders" (i.e., might miss the manual-choice deadline for an update), you are risking the stability of the entire network: nodes that continue running on the old fork are, in terms of system security for the majority fork, "adversarial"; this is clear in the case where the update was to fix an exploitable bug, but could be a serious issue even in more benign updates since the adversary might have a spacetime advantage compared to a minority fork (so our regular security analysis no longer guarantees anything).

On the other hand, under the "switch to majority" strategy, at worst you delay a hard-fork choice until users manually update to the minority hard fork.

Even past the initial period, where automatic updates become opt-in (or even fully manual), I don't see any reason for a user to rationally choose a "switch to minority fork" strategy as their default.

And just to reiterate --- we're not taking away a choice. The node operator can always decide to do something different, we're just changing the default for inaction.

Hard forks are hard to coordinate by design. This is the thing that differentiates blockchain from a centralized topology. I concede that, in the very earliest phases of the network, it might be nice from a network stability perspective to make upgrades easier to coordinate, but it's a slippery slope and a dangerous precedent and I'm not sure it's worth the cost.

In the initial period, the costs of failing to upgrade the network is quite likely the failure of the entire network. So I think this is more than "nice" in this phase. Moreover, as I wrote above, I think the risk of failing to coordinate an upgrade (that requires manual intervention of a majority of the users) is particularly high for our system, given its design and goals.

Once we're past the initial phase and switch to opt-in, protocol changes will require a conscious choice, and inaction will lead to the majority of the network not announcing intent to update, in which case inaction by nodes that did announce intent to update will be in favor of the previous protocol version (but still with the majority).

Of course I could be wrong. As we know, "Prediction is very difficult, especially when the future is concerned"... it's possible we won't have any bugs, and all our updates will be completely smooth --- but I'd rather not bet the system on this.

lrettig commented 4 years ago

Thanks for making your stance super clear. I think we are mostly on the same page. The main point I'm trying to make is that, from a social/governance perspective, defaults are extraordinarily important and powerful and we should be very cautious in choosing, or changing, a default. One could argue "he who controls the default controls the network." As I think about it more, I suspect the right thing to do here in the long term is to remove the default entirely. I'd support the sort of voting and upgrade mechanism you propose as long the default would be for the node to halt or shut down if the operator did not explicitly choose a fork (rather than just going along with the majority).

One thing I'd like to clarify:

the costs of failing to upgrade the network is quite likely the failure of the entire network

To be clear, if self-healing works, the network would not fail, right? In fact, both branches of such a fork would continue to be viable, wouldn't they?

tal-m commented 4 years ago

Thanks for making your stance super clear. I think we are mostly on the same page. The main point I'm trying to make is that, from a social/governance perspective, defaults are extraordinarily important and powerful and we should be very cautious in choosing, or changing, a default. One could argue "he who controls the default controls the network." As I think about it more, I suspect the right thing to do here in the long term is to remove the default entirely. I'd support the sort of voting and upgrade mechanism you propose as long the default would be for the node to halt or shut down if the operator did not explicitly choose a fork (rather than just going along with the majority).

Yes, sounds like we are mostly on the same page. I think changing the updates to opt-in or manual would have exactly the effect you describe given the voting scheme I suggest: a node will not update without an active action of the operator. If the operator did choose to update, that's a clear choice to join the majority fork if it happens. If the operator did not choose to update, their node will shut down automatically if the majority do decide to update (unless the operator make an active choice to remain on the old fork, by updating to different code that doesn't shut down).

One thing I'd like to clarify:

the costs of failing to upgrade the network is quite likely the failure of the entire network

To be clear, if self-healing works, the network would not fail, right? In fact, both branches of such a fork would continue to be viable, wouldn't they?

If everyone is totally honest, then yes. However, the security of a fork depends on the adversary having less than 1/3 of the weight of that fork, so splitting the network reduces its security. This is true even in a PoW-based blockchain like Bitcoin, but is actually worse in non-PoW blockchains, since the resources aren't bound to the particular fork. This means an adversary can potentially "co-opt" one fork against the other. (This is probably not trivial to do, but our security analysis doesn't rule it out.)

lrettig commented 4 years ago

If the operator did not choose to update, their node will shut down automatically if the majority do decide to update (unless the operator make an active choice to remain on the old fork, by updating to different code that doesn't shut down).

Right, with the caveat that choosing to remain on the old fork should a.) always be an option, and b.) not be any harder, in practice, than choosing to join the new fork.

tal-m commented 4 years ago

Choosing the minority fork is always an option, whether or not we allow it --- anyone who wants to can always compile and run a different version of our code, since it's open source.

However, I don't think the "easier vs. harder" line should coincide with old vs. new --- instead, it should coincide with majority vs. minority. That is, once a fork has happened, choosing a minority fork should be harder than choosing the majority, regardless of which is the new protocol version and and which is the old --- this is a default that encourages system stability and security.

Deciding whether a fork happens (by voting) would have different defaults that do coincide with old vs. new: in the initial period, it will be easier to vote in favor of an "officially sanctioned" new version (since that's the default when no explicit action is taken). Once the initial period is over, it will be easier to vote in favor of the status quo.

avive commented 4 years ago

How do we take this discussion and arguments, consider all points and have a new draft which excludes smapp updates and only focuses on full node updates? The several 'we are mostly on the same page' comments make me thing that is doable. What points are in strong disagreement and needs further discussion? Does anyone want to have a go at it? @tal-m @noamnelke @lrettig ?

lrettig commented 4 years ago

I posted a new proposal for the node update portion of this proposal to #32, please have a look there