spacemeshos / pm

Project management. Meta-tasks related to research, dev, and specs for the Spacemesh protocol and infrastructure.
http://spacemesh.io/
Creative Commons Zero v1.0 Universal
2 stars 0 forks source link

Post as a service #259

Closed poszu closed 1 month ago

poszu commented 7 months ago

Summary

Provide POST:

as a service that could live outside of the node and be online only when required. The post-service would be the owner of the identity (a pair of private/public keys), aka "node ID"/"smesher ID". It would create POST proofs on demand of the activation process running in the trusted node that it registered itself (and its identity) in.

Motivation

To be able to run lighter go-sm node, that:

Status quo

:house: Home miner

image

A home miner must run a node that is online 24/7 to stay in sync and participate in consensus. The POST part of the algorithm puts some requirements on the machine:

Because of these constraints, home miners usually need to dedicate their home computers to run the node (i.e. their desktop PC or laptop). It's unfortunate because these machines become hardly usable for anything else (for example, it's difficult to take the laptop running a node out of home and not lose rewards in the process). Ideally, the miner could run a "light node" that participates in the protocol on something like:

And run the POST on a different, powerful machine that must be online only for the short time of the cycle gap and in the remaining time can be offline or reused for something else.

:whale: Whale miner

image

A whale miner must orchestrate multiple nodes to participate with a POST bigger space commitment (a big identity must be split to be able to generate POST proofs in reasonable time) and a node can run only for 1 identity. It has a few drawbacks:

Similarly to the home miner case, a whale would ideally run a light (and low-cost) node participating in the protocol and spin up the powerful (and costly) POST machines (running post-service) on demand.

:construction: Proposed solution :construction:

The proposal is to create a new entity, a post service which runs as a separate process and communicates with a node over the network. The post-service is the only owner of:

The node it connects to merely acts on its behalf, running the protocol for it.

There is a 1 to 1 association of identity, service, and disk. There is no intent to support multiple identities per service. Having many identities is achieved by spawning multiple services. There is also no intent of supporting multiple disks per service because it is more risky (potential HW failures where 1 disk failing disables the ID for a long time) without clear benefits - it's assumed that nobody would do it anyway on a large scale as the risk is too high.

There is a 1 to N node to a post-service relationship. It has a few benefits:

Home miner standalone mode

This is a simple mode of operation for users who just want to run the node and forget. They dedicate a single machine and run everything in a single app (be it Smapp or go-spacemesh node).

In this mode, the app handles everything automatically and no extra steps from the users are required (everything works as it used to work before).

The proposal is to create a post-service manager entity that is responsible for supervising the external post-service process. Supervising means spawning the process and monitoring it, restarting it if needed.

image

Home miner split mode

In this mode, the post-service runs as a separate process (possibly on a different machine) and thus is not supervised by the node anymore. It is supposed to connect to the node and register its identity in it.

:question: Why post-service connects to the node and not the other way around? :bulb: There are a few reasons for this:

When the post-service registers the identity in the node, it can inform it about its POST and share the private keys with it. The post service manager registers the identity in each "protocol process" (activation, hare etc.) and can start running the network protocols for this identity.

:bulb: The node is considered a trusted node and hence it is fine to share the keys (they are required to sign the data).

image

Whale

This is mostly the same as the previous mode but supports multiple post-services connecting simultaneously. The node is supposed to run the network protocols for each of the registered identities in parallel.

:bulb: This mode is described and designed in more detail in #261.

image

Requirements

Simplicity and ease of use

Remarks

Even though this epic describes all 3 modes of operation, it's most focus is on implementing the post service into the node. The support for multiple identities per node is covered in #261, for which this one is a prerequisite.

Tasks

countvonzero commented 7 months ago

A whale miner must orchestrate multiple nodes to participate with a POST bigger space commitment (a big identity must be split to be able to generate POST proofs in reasonable time)

is it really how whales do it now? they don't use postcli for initialization and copy data instead?

duplicated databases that occupy a lot of additional space duplicated network traffic for the sync

they will still need this if they want to run multiple identities AFTER this proposal is deployed. the effect of this proposal is to allow separation of post and consensus. the miner still need to do 1X database and 1X network traffic per identity. the saving on duplicate network traffic/data storage can only come from running multiple identities in one single node.

the proposal seems less relevant for whales. i'd imagine they need a post service to manage multiple disks instead, which is outside the purview of this design?

The post-service is the only owner of: - identity (node ID, the keys), The node it connects to merely acts on its behalf, running the protocol for it.

this part is strange. i can also say that the node that runs the protocol merely delegate the post service to generate post data. what is the definition of ownership here? can the post service revoke the private key from the node? it doesn't matter. private keys, once shared, are owned....

it's assumed that nobody would do it anyway on a large scale as the risk is too high.

why? there is already a research proposal to combine smaller atxs into 1 big voting atx. PostServiceManager seems the perfect place to realize that.

and which mode (standalone vs distributed) will be implemented? both? it's unclear to me why "everything in a single app" is important. can you explain more? why not just go for the distributed mode only?

pigmej commented 7 months ago

is it really how whales do it now? they don't use postcli for initialization and copy data instead?

Unrelated, it's not about INIT but about proving time. They use postcli. And pretty much everyone copies data. It's just about proving not about the init phase, currently, you need to have to orchestrate 50 go-sm nodes if you have 50 data sets initialized, and you can't turn off one, do some delays with proving between them etc.

why? there is already a research proposal to combine smaller atxs into 1 big voting atx. PostServiceManager seems the perfect place to realize that.

exactly that's why service per HDD. Please do not mix up terms. It's a separate item. That's why it's proposed in that way.

they will still need this if they want to run multiple identities AFTER this proposal is deployed. the effect of this proposal is to allow separation of post and consensus. the miner still need to do 1X database and 1X network traffic per identity. the saving on duplicate network traffic/data storage can only come from running multiple identities in one single node.

Yes it's only first part, later on it will be combined with https://github.com/orgs/spacemeshos/projects/39/views/15?filterQuery=repo%3Aspacemeshos%2Fgo-spacemesh%2C%22spacemeshos%2Fpoet%22%2C%22spacemeshos%2Fgo-scale%22%2C%22spacemeshos%2Fpost%22%2C%22spacemeshos%2Fpost-rs%22%2C%22spacemeshos%2Fpm%22%2C%22spacemeshos%2Fapi%22+-status%3A%22%E2%9C%85+Done%22+label%3A%22feat%2Fmulti+smeshers%22 which will enable 1x for multiple smeshers.

the proposal seems less relevant for whales. i'd imagine they need a post service to manage multiple disks instead, which is outside the purview of this design?

The idea of using multiple separate disks for one ATX is faulty at best. Does not yield any benefits (especially after combining is ready) but have multiple risks involved, and also will be slower than separate services (because of nonces). After having the possibility to combine ATX it yields only benefits (you can gamble on nonces, you can prove in parallel, sequentially, you get atx decrease/increase for free etc)

poszu commented 7 months ago

Adding to @pigmej comments

duplicated databases that occupy a lot of additional space duplicated network traffic for the sync

they will still need this if they want to run multiple identities AFTER this proposal is deployed. the effect of this proposal is to allow separation of post and consensus. the miner still need to do 1X database and 1X network traffic per identity. the saving on duplicate network traffic/data storage can only come from running multiple identities in one single node.

That's not true, this proposal covers the case of attaching many post-services to a node and running the consensus for multiple IDs as well. It's covered in the whale section (distributed mode). Quoting from the proposal:

🚧 Proposed solution 🚧
(...)
There is a 1 to N node to a post-service relationship. It has a few benefits:
- less network traffic as sync is done once for all identities,
- less storage for the database,

Separation of POST from the node is the first step to achieve this.


The post-service is the only owner of: - identity (node ID, the keys), The node it connects to merely acts on its behalf, running the protocol for it.

this part is strange. i can also say that the node that runs the protocol merely delegate the post service to generate post data. what is the definition of ownership here? can the post service revoke the private key from the node? it doesn't matter. private keys, once shared, are owned....

The POST data is inextricably linked with an identity (ID), the data is only valid for a given ID. The idea is that the node acts as a "protocol runner" for the identities that connect to it (register themself).

can the post service revoke the private key from the node? it doesn't matter. private keys, once shared, are owned....

The node is considered trusted, a smesher is supposed to be the owner of both the node and all the post-services connecting to it. This is not a solution for running a "node operation" that other people can connect to and share their private keys.


it's assumed that nobody would do it anyway on a large scale as the risk is too high.

why? there is already a research proposal to combine smaller atxs into 1 big voting atx. PostServiceManager seems the perfect place to realize that.

That sentence was about supporting a single ID spread across multiple disks. It's not a good solution as it has a few drawbacks:

ATX merging has nothing to do with a single ID using many disks. On the contrary, ATX merging is meant for exactly the opposite case when many small IDs (each its own disk) are merged into 1 big ID.


and which mode (standalone vs distributed) will be implemented? both?

Eventually both. Quoting from the proposal:

Remarks Even though this epic describes all 3 modes of operation, it's most focus is on implementing the post service into the node. The support for multiple identities per node is covered in https://github.com/spacemeshos/pm/issues/261, for which this one is a prerequisite.


it's unclear to me why "everything in a single app" is important. can you explain more? why not just go for the distributed mode only?

The "everything in one app", aka standalone mode is meant for "regular" smeshers that just want to run smapp/go-spacemesh and have both consensus and POST proving run on the same machine. They just want to execute the application and don't want to care about running a post-service as well (the same way as they run smapp and don't need to run go-sm themself).

In this mode the underlying logic is the same, the communication of node <-> post-service is the same. The only difference is that the go-sm node spawns and supervises the post-service on behalf of the user.

pigmej commented 7 months ago

and which mode (standalone vs distributed) will be implemented? both? it's unclear to me why "everything in a single app" is important. can you explain more? why not just go for the distributed mode only?

Standalone is needed for current smapp use case and for "home" users where they don't need such specifics. Then go-sm will spawn service and nothing else needs to be changed anywhere.

countvonzero commented 7 months ago

thanks for the explanations. my confusion came from the fact that

the content in this issue (that explained the rationale and plan) seems better suited in an umbrella issue that refer to this and #261 as the breakup of implementation plans. this allow people to see the overall picture before zooming into each stage of the implementation plan, in which this issue is the prerequisite.

The POST data is inextricably linked with an identity (ID), the data is only valid for a given ID. The idea is that the node acts as a "protocol runner" for the identities that connect to it (register themself).

i think we are arguing semantics here. there is no such thing as key "ownership" once it's shared. so the best place to assign ownership is to the node operator. for example, if post service A registered with node X and node Y, for whatever reason, all A/X/Y have the key. but since it is assumed that the same operator, say Alice, run all A/X/Y, therefore the ownership should be Alice.

In this mode the underlying logic is the same, the communication of node <-> post-service is the same. The only difference is that the go-sm node spawns and supervises the post-service on behalf of the user.

how does a smesher tell smapp to connect to an external post service that resides in a separate and more powerful home PC than smapp is running on? i was thinking longer term supporting only split mode is simpler and more flexible. in the short-term, since smapp is already managing go-sm node, it doesn't seem too terrible to make it also manage post service.

reythia commented 6 months ago

Will each POST still require a unique k2pow?

poszu commented 6 months ago

Will each POST still require a unique k2pow?

Hey, sorry for a late reply. Basically, the input to k2pow is [k2pow_nonce, nonce_group, post_challenge, nodeID] so every node requires its own k2pow.