Post as a service - Githubissues

Summary

Provide POST:

initialization
proving
verification (?)

as a service that could live outside of the node and be online only when required. The post-service would be the owner of the identity (a pair of private/public keys), aka "node ID"/"smesher ID". It would create POST proofs on demand of the activation process running in the trusted node that it registered itself (and its identity) in.

Motivation

To be able to run lighter go-sm node, that:

doesn't need storage for POST
doesn't need to prove POST
doesn't require OpenCL to run (bonus)
later on pure smesher UI

Status quo

:house: Home miner

A home miner must run a node that is online 24/7 to stay in sync and participate in consensus. The POST part of the algorithm puts some requirements on the machine:

it needs a lot of storage for the data,
it must be powerful enough to be able to create the POST proof in a "short" time.

Because of these constraints, home miners usually need to dedicate their home computers to run the node (i.e. their desktop PC or laptop). It's unfortunate because these machines become hardly usable for anything else (for example, it's difficult to take the laptop running a node out of home and not lose rewards in the process). Ideally, the miner could run a "light node" that participates in the protocol on something like:

cloud
energy efficient device like a NUC or Raspberry Pi.

And run the POST on a different, powerful machine that must be online only for the short time of the cycle gap and in the remaining time can be offline or reused for something else.

:whale: Whale miner

A whale miner must orchestrate multiple nodes to participate with a POST bigger space commitment (a big identity must be split to be able to generate POST proofs in reasonable time) and a node can run only for 1 identity. It has a few drawbacks:

duplicated databases that occupy a lot of additional space
duplicated network traffic for the sync
each node requires a powerful machine for the POST, which must be online (and hence cost) 24/7.

Similarly to the home miner case, a whale would ideally run a light (and low-cost) node participating in the protocol and spin up the powerful (and costly) POST machines (running post-service) on demand.

:construction: Proposed solution :construction:

The proposal is to create a new entity, a post service which runs as a separate process and communicates with a node over the network. The post-service is the only owner of:

identity (node ID, the keys),
the POST data associated with the identity.

The node it connects to merely acts on its behalf, running the protocol for it.

There is a 1 to 1 association of identity, service, and disk. There is no intent to support multiple identities per service. Having many identities is achieved by spawning multiple services. There is also no intent of supporting multiple disks per service because it is more risky (potential HW failures where 1 disk failing disables the ID for a long time) without clear benefits - it's assumed that nobody would do it anyway on a large scale as the risk is too high.

There is a 1 to N node to a post-service relationship. It has a few benefits:

less network traffic as sync is done once for all identities,
less storage for the database,
the node can run on a lighter machine,
it's possible to orchestrate post-services smart by turning them on when they are needed only, which allows to cut costs or reuse the machines for different purposes.

Home miner standalone mode

This is a simple mode of operation for users who just want to run the node and forget. They dedicate a single machine and run everything in a single app (be it Smapp or go-spacemesh node).

In this mode, the app handles everything automatically and no extra steps from the users are required (everything works as it used to work before).

The proposal is to create a post-service manager entity that is responsible for supervising the external post-service process. Supervising means spawning the process and monitoring it, restarting it if needed.

Home miner split mode

In this mode, the post-service runs as a separate process (possibly on a different machine) and thus is not supervised by the node anymore. It is supposed to connect to the node and register its identity in it.

:question: Why post-service connects to the node and not the other way around? :bulb: There are a few reasons for this:

the post-service might not be easily addressable, especially if it is a user's home computer (sometimes there isn't even a static IP to begin with). On the other hand, a node (running in a local network or cloud) should be easily addressable.
it should be possible to connect additional post services at any point in time without any configuration changes on the node side (a dynamic post discovery).

When the post-service registers the identity in the node, it can inform it about its POST and share the private keys with it. The post service manager registers the identity in each "protocol process" (activation, hare etc.) and can start running the network protocols for this identity.

:bulb: The node is considered a trusted node and hence it is fine to share the keys (they are required to sign the data).

Whale

This is mostly the same as the previous mode but supports multiple post-services connecting simultaneously. The node is supposed to run the network protocols for each of the registered identities in parallel.

:bulb: This mode is described and designed in more detail in #261.

Requirements

Simplicity and ease of use

POST service needs to be as simple and require little to no configuration (preferably only node address)
it should be possible to attach new post-services to a running node in runtime
it must be possible to run go-spacemesh binary alone (the standalone mode):
- go-sm could spawn post service itself and talk over GPRC etc.
  Performance
POST proof generations running in parallel should reuse a single k2pow when it makes sense (delegated k2 pow ) :question: not sure about the state of k2pow delegation. This requirement might need to be dropped.
it should be possible to switch off the external post services when they are not needed (not generating the POST proof)
Safety
post-service and node should authenticate each other to avoid malicious behaviour
ability to detach from the node and connect to a different one. ⚠️ be cautious to not have 2 nodes producing ATXs for the same ID ⚠️

Remarks

Even though this epic describes all 3 modes of operation, it's most focus is on implementing the post service into the node. The support for multiple identities per node is covered in #261, for which this one is a prerequisite.

Tasks

[x] Finish POST service requirements
[x] spacemeshos/pm#260
[x] spacemeshos/api#269
[x] spacemeshos/post-rs#129
[x] spacemeshos/go-spacemesh#5042
[x] spacemeshos/go-spacemesh#5131
[x] spacemeshos/go-spacemesh#5149

A whale miner must orchestrate multiple nodes to participate with a POST bigger space commitment (a big identity must be split to be able to generate POST proofs in reasonable time)

is it really how whales do it now? they don't use postcli for initialization and copy data instead?

duplicated databases that occupy a lot of additional space duplicated network traffic for the sync

they will still need this if they want to run multiple identities AFTER this proposal is deployed. the effect of this proposal is to allow separation of post and consensus. the miner still need to do 1X database and 1X network traffic per identity. the saving on duplicate network traffic/data storage can only come from running multiple identities in one single node.

the proposal seems less relevant for whales. i'd imagine they need a post service to manage multiple disks instead, which is outside the purview of this design?

The post-service is the only owner of: - identity (node ID, the keys), The node it connects to merely acts on its behalf, running the protocol for it.

this part is strange. i can also say that the node that runs the protocol merely delegate the post service to generate post data. what is the definition of ownership here? can the post service revoke the private key from the node? it doesn't matter. private keys, once shared, are owned....

it's assumed that nobody would do it anyway on a large scale as the risk is too high.

why? there is already a research proposal to combine smaller atxs into 1 big voting atx. PostServiceManager seems the perfect place to realize that.

and which mode (standalone vs distributed) will be implemented? both? it's unclear to me why "everything in a single app" is important. can you explain more? why not just go for the distributed mode only?

is it really how whales do it now? they don't use postcli for initialization and copy data instead?

Unrelated, it's not about INIT but about proving time. They use postcli. And pretty much everyone copies data. It's just about proving not about the init phase, currently, you need to have to orchestrate 50 go-sm nodes if you have 50 data sets initialized, and you can't turn off one, do some delays with proving between them etc.

why? there is already a research proposal to combine smaller atxs into 1 big voting atx. PostServiceManager seems the perfect place to realize that.

exactly that's why service per HDD. Please do not mix up terms. It's a separate item. That's why it's proposed in that way.

they will still need this if they want to run multiple identities AFTER this proposal is deployed. the effect of this proposal is to allow separation of post and consensus. the miner still need to do 1X database and 1X network traffic per identity. the saving on duplicate network traffic/data storage can only come from running multiple identities in one single node.

Yes it's only first part, later on it will be combined with https://github.com/orgs/spacemeshos/projects/39/views/15?filterQuery=repo%3Aspacemeshos%2Fgo-spacemesh%2C%22spacemeshos%2Fpoet%22%2C%22spacemeshos%2Fgo-scale%22%2C%22spacemeshos%2Fpost%22%2C%22spacemeshos%2Fpost-rs%22%2C%22spacemeshos%2Fpm%22%2C%22spacemeshos%2Fapi%22+-status%3A%22%E2%9C%85+Done%22+label%3A%22feat%2Fmulti+smeshers%22 which will enable 1x for multiple smeshers.

the proposal seems less relevant for whales. i'd imagine they need a post service to manage multiple disks instead, which is outside the purview of this design?

The idea of using multiple separate disks for one ATX is faulty at best. Does not yield any benefits (especially after combining is ready) but have multiple risks involved, and also will be slower than separate services (because of nonces). After having the possibility to combine ATX it yields only benefits (you can gamble on nonces, you can prove in parallel, sequentially, you get atx decrease/increase for free etc)

Adding to @pigmej comments

duplicated databases that occupy a lot of additional space duplicated network traffic for the sync

they will still need this if they want to run multiple identities AFTER this proposal is deployed. the effect of this proposal is to allow separation of post and consensus. the miner still need to do 1X database and 1X network traffic per identity. the saving on duplicate network traffic/data storage can only come from running multiple identities in one single node.

That's not true, this proposal covers the case of attaching many post-services to a node and running the consensus for multiple IDs as well. It's covered in the whale section (distributed mode). Quoting from the proposal:

🚧 Proposed solution 🚧
(...)
There is a 1 to N node to a post-service relationship. It has a few benefits:
- less network traffic as sync is done once for all identities,
- less storage for the database,

Separation of POST from the node is the first step to achieve this.

The post-service is the only owner of: - identity (node ID, the keys), The node it connects to merely acts on its behalf, running the protocol for it.

this part is strange. i can also say that the node that runs the protocol merely delegate the post service to generate post data. what is the definition of ownership here? can the post service revoke the private key from the node? it doesn't matter. private keys, once shared, are owned....

The POST data is inextricably linked with an identity (ID), the data is only valid for a given ID. The idea is that the node acts as a "protocol runner" for the identities that connect to it (register themself).

can the post service revoke the private key from the node? it doesn't matter. private keys, once shared, are owned....

The node is considered trusted, a smesher is supposed to be the owner of both the node and all the post-services connecting to it. This is not a solution for running a "node operation" that other people can connect to and share their private keys.

it's assumed that nobody would do it anyway on a large scale as the risk is too high.

why? there is already a research proposal to combine smaller atxs into 1 big voting atx. PostServiceManager seems the perfect place to realize that.

That sentence was about supporting a single ID spread across multiple disks. It's not a good solution as it has a few drawbacks:

it comes with a higher risk of failure:
- when a disk fails, the ID cannot generate a proof;
- POST proof is in or out. It's better to have X out of Y proved than O out of 1;
it would be really complicated to produce a POST proof from many disks efficiently. It's much easier to orchestrate N proofs for N identities on N disks than it is to generate 1 proof for 1 identity on N disks

ATX merging has nothing to do with a single ID using many disks. On the contrary, ATX merging is meant for exactly the opposite case when many small IDs (each its own disk) are merged into 1 big ID.

and which mode (standalone vs distributed) will be implemented? both?

Eventually both. Quoting from the proposal:

Remarks Even though this epic describes all 3 modes of operation, it's most focus is on implementing the post service into the node. The support for multiple identities per node is covered in https://github.com/spacemeshos/pm/issues/261, for which this one is a prerequisite.

it's unclear to me why "everything in a single app" is important. can you explain more? why not just go for the distributed mode only?

The "everything in one app", aka standalone mode is meant for "regular" smeshers that just want to run smapp/go-spacemesh and have both consensus and POST proving run on the same machine. They just want to execute the application and don't want to care about running a post-service as well (the same way as they run smapp and don't need to run go-sm themself).

In this mode the underlying logic is the same, the communication of node <-> post-service is the same. The only difference is that the go-sm node spawns and supervises the post-service on behalf of the user.

and which mode (standalone vs distributed) will be implemented? both? it's unclear to me why "everything in a single app" is important. can you explain more? why not just go for the distributed mode only?

Standalone is needed for current smapp use case and for "home" users where they don't need such specifics. Then go-sm will spawn service and nothing else needs to be changed anywhere.

thanks for the explanations. my confusion came from the fact that

this issue is titled "Post as a service"
the content of the issue speaks of end result / benefits of combination of this issue and #261

the content in this issue (that explained the rationale and plan) seems better suited in an umbrella issue that refer to this and #261 as the breakup of implementation plans. this allow people to see the overall picture before zooming into each stage of the implementation plan, in which this issue is the prerequisite.

The POST data is inextricably linked with an identity (ID), the data is only valid for a given ID. The idea is that the node acts as a "protocol runner" for the identities that connect to it (register themself).

i think we are arguing semantics here. there is no such thing as key "ownership" once it's shared. so the best place to assign ownership is to the node operator. for example, if post service A registered with node X and node Y, for whatever reason, all A/X/Y have the key. but since it is assumed that the same operator, say Alice, run all A/X/Y, therefore the ownership should be Alice.

In this mode the underlying logic is the same, the communication of node <-> post-service is the same. The only difference is that the go-sm node spawns and supervises the post-service on behalf of the user.

how does a smesher tell smapp to connect to an external post service that resides in a separate and more powerful home PC than smapp is running on? i was thinking longer term supporting only split mode is simpler and more flexible. in the short-term, since smapp is already managing go-sm node, it doesn't seem too terrible to make it also manage post service.

Will each POST still require a unique k2pow?