pokt-network / pocket-core

Official implementation of the Pocket Network Protocol
http://www.pokt.network
MIT License
206 stars 102 forks source link

LeanPOKT Proposal & Design Specification #1437

Closed nodiesBlade closed 1 year ago

nodiesBlade commented 2 years ago

Name: ThunderStake (Pierre Spiegel & Addison Spiegel) | BaaS Pools LLC

Please explain your change in detail. LeanPocket is a large optimization to the Pocket Core’s Client (PCC) by allowing multiple nodes to utilize one full node. Servicers/Validators (node set) can now leverage the same state cache, and blockchain data, and no longer have to validate as many transactions in a block as the node set grows. This reduces the number of resources needed for “n” nodes to a constant number, O(N) to O(1) for memory, space, io, and networking.

Please provide a justification for your change. This is a pure software optimization that is non-consensus changing, allowing for multiple full PCC nodes to consolidate into one PCC allowing for massive cost savings and better resource utilization.

Security Requirements

Functional? Goals

Non-functional Goals

High-Level Implementation Strategy This implementation is separated into two sections: Servicing and Validating

Servicing: The implementation of the servicer aspect involves storing the node set into a JSON file and loading it on PCC boot. Whenever a user hits the PCC RPC endpoint v1/client/relay, the payload contains a servicer public key that can be used to map the correct servicer by using the stored node set. By using the correct mapped private key, PCC relay logic involves minimal changes. The evidence and session storage (in memory and DB) will need to be modified to include the node addresses as the key or PCC can include separate storage for each node in the set. Finally, the respective claim and proof tx logic will need to be modified to submit the tx for each node in the set at the correct time defined by the original PCC logic.

Validating: Validating is very similar to servicing - the Tendermint Node (TM) will be launched with the same node set json file. During consensus rounds, the node set will be filtered to check if it is part of the current states validators. If so, the key is used to sign and broadcast their decision - let that be pre-commit, prevote. This same logic also applies when one of the validators is the block proposer. The TM node should also maintain each validator's last sign state (as in the original PCC) stored into a file so that the validator piece remains fault-tolerant/recoverable in the event of a failure.

Design decisions

Risks

Success Criteria

Monitoring/Testing strategy Our strategy will include multiple stages. In order, it will include code reviews, X amount of Unit Test Coverage, local net testing, integration testing (Q/A testing), testnet, mainnet beta, and then finally a mainnet RC. Many of these tests will heavily focus on the changes made to the client including simulations and testing around handling relays, claim & proof lifecycle, and running various benchmarks to find any bottlenecks in the relay lifecycle.

High Level Architecture LeanPocket_Architecture_v1 drawio (1)

Forecasted Improvements Diagrams image image

tingiris commented 2 years ago

@PoktBlade So if I understand this, essentially the node sets are simply keys that the servicer can use to associate a with the public key in the request payload. So, all of the node hostnames would point to the same IP address I assume. I'm I understanding this correctly?

nodiesBlade commented 2 years ago

@PoktBlade So if I understand this, essentially the node sets are simply keys that the servicer can use to associate a with the public key in the request payload. So, all of the node hostnames would point to the same IP address I assume. I'm I understanding this correctly?

That sounds about right!

luyzdeleon commented 2 years ago

Good job putting this spec draft in short order. Here's some suggestions to finalize this draft:

1. Design decisions

2. Risks

I would add to the Risks that because Pocket Core uses file system databases, there's a risk of the File Descriptor limit to be reached by the OS given the fact that continuous access to an unbounded number of Servicers for a single process could make the process needs to access the filesystem unsustainable (plus the extra networking which also impacts the file descriptor count).

3. Success Criteria

I would recommend the success criteria to be as follows:

  1. A detailed report of the QA executed (as you well detailed in Monitoring/Testing strategy)
  2. Demonstrable benchmarks in relay processing speed, cross-checked with resource management indicating sustainable operational levels on both.
  3. End-To-End documentation: a finished version of this spec and guides on how to use the feature.
tingiris commented 2 years ago

Hey @luyzdeleon on the risks associated withthe file descriptior limit. I know you can see the kernel limits in /proc/sys/fs/file-max for most linux versions. Is the risk you're concered about that the os/kernel limit would be exceeded or the ulimt set for the user context that the process is using?

nodiesBlade commented 2 years ago

@tingiris From what I understand the majority of the FD's being open is on consensus / mempool layer Although, with multiple servicers writing to the evidence cache on top of additional network requests overhead can also elevate this. I believe we can monitor the # of FD as part of benchmarks to make sure nothing regresses too significantly.

Also @luyzdeleon incorporating your changes, thank you for the insights and well said callouts on every single point.

luyzdeleon commented 2 years ago

@tingiris It's both, historically we've advised people to up their limits based on different growth stages of the network and optimizations that have been done to the software, but since this potentially could create an unbounded situation I just wanted to surface it to be aware as it has been a problem in the past.

@PoktBlade Glad to be of service!

POKT-Discourse commented 2 years ago

This issue has been mentioned on Pocket Network Forum. There might be relevant details there:

https://forum.pokt.network/t/pep-35-the-v0-optimization-leanpocket/3042/33

POKT-Discourse commented 1 year ago

This issue has been mentioned on Pocket Network Forum. There might be relevant details there:

https://forum.pokt.network/t/preproposal-poktfund-2022-leanpokt-and-security-vulnerability-reimbursement/3933/1

POKT-Discourse commented 1 year ago

This issue has been mentioned on Pocket Network Forum. There might be relevant details there:

https://forum.pokt.network/t/preproposal-poktfund-2022-leanpokt-and-security-vulnerability-reimbursement/3933/4

POKT-Discourse commented 1 year ago

This issue has been mentioned on Pocket Network Forum. There might be relevant details there:

https://forum.pokt.network/t/thunderhead-poktfund-leanpokt-proposal-reimbursement/4069/1

Olshansk commented 1 year ago

DOne a long-long time ago