spacemeshos / pm

Project management. Meta-tasks related to research, dev, and specs for the Spacemesh protocol and infrastructure.
http://spacemesh.io/
Creative Commons Zero v1.0 Universal
2 stars 0 forks source link

Resilient PoET Architecture #133

Closed antonlerner closed 6 months ago

antonlerner commented 4 years ago

Motivation

The current implementation of the system uses a single PoET server, connected to a single node for broadcasting proofs. This setup is riddled with single-points-of-failure, each of which can essentially stop the network.

Miners could set their nodes to use a different PoET server, and we could run multiple servers and create a mechanism that selects one randomly, but this would be a band-aid and a long term solution isn't much harder to implement.

Architecture of a Reliable PoET Service

Public Address

A PoET Service should be identified by a DNS name rather than an IP address. Round-robin DNS should be employed for fault tolerance, pointing to PoET Gateways on different clouds.

To prevent DNS Spoofing, it's important to identify the PoET service by a public key in addition to the DNS name, signing the PoET's response to registration and validating it in the node.

PoET Gateway

This is a stateless public API gateway (reverse proxy). It exposes the PoET API to miners and forwards requests to the PoET Controller.

This stateless component enables easy horizontal scaling and fault-tolerant redundancy of the public-facing API. It also enables sending requests to multiple PoET Controllers concurrently to allow hot backup and fast failover of the PoET Controller.

PoET Controller

The PoET Controller is in charge of the PoET service's lifecycle. It keeps a round open for membership registration, accepts and records registrations, closes the round when the time comes and calculates the membership root. It can be associated with multiple PoET Core instances and when a round needs to start processing, it orders an available PoET Core to start working on it, providing it with the membership root. It then listens for completion of the round, takes the finished proof, packages it along with the membership proof, signs it and sends it to the Gossip Gateway.

When a miner requests to register the PoET Controller should persist the membership request and only then respond with the round ID and sign the response.

Backups

Since the PoET controller is stateful, it's critical to have backups of its state.

There should be at least two instances of the controller, running on different clouds in a hot-backup failover formation. Each membership registration should only be acknowledged by the PoET Gateway once at least two PoET Controllers acknowledged it. The state should be kept synchronized (this means ensuring that registrations are processed in the same order).

Each instance of the controller can be connected to multiple PoET Cores. We can implement a passive mode, where a controller knows it's a hot backup and doesn't actually control the core instances. It's then connected to the same instances that the master controller is connected to and it only starts actually controlling them if the master fails.

Alternatively, each controller can be connected to its own core instances and then those also run in parallel, providing hot-backup failover redundancy for that component as well.

There's no risk in broadcasting the same proofs from two controller instances, as the generated proofs should be identical.

In addition to the hot backup, it's a good idea to keep frequent snapshots of the persistent storage for the rare occasion that both hot backups fail and we need to bring up a new controller. This will minimize the lost registrations in such an unlikely event. A snapshot needs to be made whenever work on a PoET rounds starts to avoid losing the list of members before it's complete. The snapshots only need to be kept until the proof is broadcast.

PoET Core

This component is what performs the actual sequential work. Several PoET Core processes may eventually run on a single host machine, each locked down to a specific CPU core. To maximize performance and consistency we should strive to keep this component's scope minimal.

It should run a gRPC server, accepting a challenge from the controller to perform the sequential work on.

The PoET Core will persist its state to disk so it can be recovered in case the process is restarted.

The controller will be in charge of polling the state of the PoET round and, once complete, requesting the proof to proceed with the proof publication process.

Future Optimization

To maximize performance and stability, the PoET Core process can be stripped down even further by making it run ad-hoc with specific params (incl. the membership root). In that setup there's no gRPC server. The process is started by the controller with the membership root as a command-line argument, it performs the sequential work and stores the proof on disk and that's it. This would make the entire process single threaded and very simple.

It would offload most of the complexity to the controller, which would add complexity to that component, which is why this shouldn't be implemented immediately.

Gossip Gateway

The PoET server publishes proofs via the gossip network. To do this it calls an endpoint on a node's gRPC server.

There should be a component that ensures this works consistently. There should be multiple nodes used for the broadcast to minimize reliance on the stability and network reachability of a single node. The Gossip Gateway Service should perform health checks on the nodes it uses for broadcast and be able to replace bad nodes.

To ensure the reliable dissemination of the PoET proof, which is critical to the system, the Gossip Gateway should keep a list, in addition to the "broadcast nodes" list, of passive "verification nodes", where it listens for the messages it broadcasts to ensure they are indeed received on the network.

To minimize the critical parts of the system, and avoid the need to have backups of the Gossip Gateway, it should only respond to broadcast requests with a success once it closed the loop and received the message via the verification nodes. This could take considerable time due to real-world propagation, but it leaves the responsibility for retrying in case of failure with the PoET Controller, which is easier to handle safely.

Letting nodes select a PoET service

Having a resilient PoET service doesn't compete with allowing nodes to select an entirely different PoET service, manually or having some automatic fallback mechanism. We must support this in order to get rid of PoET service centralization. At the time of writing, this is premature as we don't yet have the PoET incentive structure in place, and even if we did - nobody will incentivize PoETs on the testnet.

Eventually we'll want miners to select their own PoET service from a list of options. While Spacemesh will run a PoET service, we imaging others will run competing services and that may have higher performance and possible cost more to use (earning the operators money).

noamnelke commented 4 years ago

To make commenting on this easier, I've copied this to a Google Doc (should be open for comments to anyone with the link) - please review there.

noamnelke commented 4 years ago

image

noamnelke commented 4 years ago

Alternative: Smart-Contract-Based Registration

PoET registration happens via smart-contract. A miner makes a transaction to the smart-contract, paying for the registration. The payment is unspendable until the PoET service produces a proof that includes the miner.

The PoET could lock-up funds in the contract that will be released to registrants if the proof isn't posted in time.

This approach has several advantages:

Possible disadvantages:

avive commented 4 years ago

Is this really for TNFF - Code complete due date by end of this week? @antonlerner @noamnelke

noamnelke commented 4 years ago

I think @moshababo broke out all the urgent pieces that are for TNFF.

Can we move this issue (epic) to TN1 now? (we'll break out more issues later)

avive commented 4 years ago

This is very cool but from a product planning perspective the smart contracts for poet is over ambitious even for 0.3 and is something we should schedule for the next update of the mainent but not for the 0.3 mainent. There are many higher level priorities on the basic protocol level.

moshababo commented 1 year ago

@pigmej @lrettig @noamnelke need to decide what should be implemented from this proposal.

lrettig commented 1 year ago

This won't make it into genesis. There's no pending dev task here (yet) so I'm moving this to the pm repo.

pigmej commented 6 months ago

This one is deprecated and replaced by https://github.com/spacemeshos/pm/issues/257