LeanPOKT Proposal & Design Specification

nodiesBlade commented 2 years ago

Name: ThunderStake (Pierre Spiegel & Addison Spiegel) | BaaS Pools LLC

Please explain your change in detail. LeanPocket is a large optimization to the Pocket Core’s Client (PCC) by allowing multiple nodes to utilize one full node. Servicers/Validators (node set) can now leverage the same state cache, and blockchain data, and no longer have to validate as many transactions in a block as the node set grows. This reduces the number of resources needed for “n” nodes to a constant number, O(N) to O(1) for memory, space, io, and networking.

Please provide a justification for your change. This is a pure software optimization that is non-consensus changing, allowing for multiple full PCC nodes to consolidate into one PCC allowing for massive cost savings and better resource utilization.

Security Requirements

Support validator functionality under a PCC so that servicers can fall back to validator nodes whenever it needs to.

Functional? Goals

Add multiple servicers support under a PCC to handle relays
Add multiple validators' support under a PCC to participate in voting rounds and block proposing.
Add additional Prometheus metrics for the node set

Non-functional Goals

Improve the scalability of PCC in terms of memory, CPU, space & io, and networking
Maintain the same performance and logic of the original PCC client (tx submission, validation, challenge requests, etc)

High-Level Implementation Strategy This implementation is separated into two sections: Servicing and Validating

Servicing: The implementation of the servicer aspect involves storing the node set into a JSON file and loading it on PCC boot. Whenever a user hits the PCC RPC endpoint v1/client/relay, the payload contains a servicer public key that can be used to map the correct servicer by using the stored node set. By using the correct mapped private key, PCC relay logic involves minimal changes. The evidence and session storage (in memory and DB) will need to be modified to include the node addresses as the key or PCC can include separate storage for each node in the set. Finally, the respective claim and proof tx logic will need to be modified to submit the tx for each node in the set at the correct time defined by the original PCC logic.

Validating: Validating is very similar to servicing - the Tendermint Node (TM) will be launched with the same node set json file. During consensus rounds, the node set will be filtered to check if it is part of the current states validators. If so, the key is used to sign and broadcast their decision - let that be pre-commit, prevote. This same logic also applies when one of the validators is the block proposer. The TM node should also maintain each validator's last sign state (as in the original PCC) stored into a file so that the validator piece remains fault-tolerant/recoverable in the event of a failure.

Design decisions

This proposal aims to be backward compatible with the current latest RC on the mainnet, keeping the same exact functionality if node runners decide to do so. Therefore, instead of completely modifying the relay and validator logic directly, we propose that we add additional functionality to both validators and services that are controlled through a feature flag in Pocket Config called lean_pocket (bool).
We will keep a siloed in memory session cache & evidence cache/DB for each node, unless bottlenecks in benchmarks say otherwise.
All additional functionality will be prefixed or suffixed with lean and can later be removed or integrated completely in future RCs.

Risks

PCC can crash for many various reasons (memory leak for example). While existing crashes of PCC is out of the scope of this project, it will still result in the entire node set going down at once. To minimize this, we should advise for a reasonable (definable) amount that is a balance between affecting sessions, validation, relay rewards, and allowing for an optimal amount of resource savings.
Lock contentions / blocking time could cause longer RTT response to the user at a high concurrent relay load. We can run benchmarks and profiling (pprof) to identify the sweet spot & limitations. Furthermore, we can release benchmarks to users as guidance on how to properly provision their infrastructure.
There's a risk of the File Descriptor limit to be reached by the OS given the fact that continuous access to an unbounded number of Servicers for a single process could make the process needs to access the filesystem unsustainable (plus the extra networking which also impacts the file descriptor count). This can be migitated with proper guidance for node runners and through benchmarking trials.
LEANPokt will cause a number of peers to drop off since users will be consolidating nodes. TM P2P module needs to be solid to withstand these changes so that the number of peers for each node stays healthy and in sync.
DDOS Attacks cause more impact. Most users were already stacking full nodes onto one server, this is an existing threat as is. Our release guidance on the number of nodes can be conservative to minimize this as well if this poses a large threat.
Things not caught in Q/A Testing

Success Criteria

A detailed report of the QA executed (as you well detailed in Monitoring/Testing strategy)
Demonstrable benchmarks in relay processing speed, cross-checked with resource management indicating sustainable operational levels on both.
End-To-End documentation: a finished version of this spec and guides on how to use the feature.

Monitoring/Testing strategy Our strategy will include multiple stages. In order, it will include code reviews, X amount of Unit Test Coverage, local net testing, integration testing (Q/A testing), testnet, mainnet beta, and then finally a mainnet RC. Many of these tests will heavily focus on the changes made to the client including simulations and testing around handling relays, claim & proof lifecycle, and running various benchmarks to find any bottlenecks in the relay lifecycle.

High Level Architecture LeanPocket_Architecture_v1 drawio (1)

Forecasted Improvements Diagrams

tingiris commented 2 years ago

@PoktBlade So if I understand this, essentially the node sets are simply keys that the servicer can use to associate a with the public key in the request payload. So, all of the node hostnames would point to the same IP address I assume. I'm I understanding this correctly?

nodiesBlade commented 2 years ago

@PoktBlade So if I understand this, essentially the node sets are simply keys that the servicer can use to associate a with the public key in the request payload. So, all of the node hostnames would point to the same IP address I assume. I'm I understanding this correctly?

That sounds about right!

luyzdeleon commented 2 years ago

Good job putting this spec draft in short order. Here's some suggestions to finalize this draft:

1. Design decisions

I believe for backwards compatibility reasons, and to be able to conform to the interfaces that several node runners have built automation and tooling for, this should be added as an additional "opt-in" functionality, meaning adding a feature flag to enable the feature. We can ponder in a later release if this will become the standard way of how the software behaves and deprecate the existing functionality then.
Given my previous opinion, I believe any multi-PCC cache/db must be kept separate in order to align with existing functionality.
One thing I don't see mentioned is how will the relay data will be fed to Pocket Core's prometheus metrics system.

2. Risks

I would add to the Risks that because Pocket Core uses file system databases, there's a risk of the File Descriptor limit to be reached by the OS given the fact that continuous access to an unbounded number of Servicers for a single process could make the process needs to access the filesystem unsustainable (plus the extra networking which also impacts the file descriptor count).

3. Success Criteria

I would recommend the success criteria to be as follows:

A detailed report of the QA executed (as you well detailed in Monitoring/Testing strategy)
Demonstrable benchmarks in relay processing speed, cross-checked with resource management indicating sustainable operational levels on both.
End-To-End documentation: a finished version of this spec and guides on how to use the feature.

tingiris commented 2 years ago

Hey @luyzdeleon on the risks associated withthe file descriptior limit. I know you can see the kernel limits in /proc/sys/fs/file-max for most linux versions. Is the risk you're concered about that the os/kernel limit would be exceeded or the ulimt set for the user context that the process is using?

nodiesBlade commented 2 years ago

@tingiris From what I understand the majority of the FD's being open is on consensus / mempool layer Although, with multiple servicers writing to the evidence cache on top of additional network requests overhead can also elevate this. I believe we can monitor the # of FD as part of benchmarks to make sure nothing regresses too significantly.

Also @luyzdeleon incorporating your changes, thank you for the insights and well said callouts on every single point.

luyzdeleon commented 2 years ago

@tingiris It's both, historically we've advised people to up their limits based on different growth stages of the network and optimizations that have been done to the software, but since this potentially could create an unbounded situation I just wanted to surface it to be aware as it has been a problem in the past.

@PoktBlade Glad to be of service!

POKT-Discourse commented 2 years ago

This issue has been mentioned on Pocket Network Forum. There might be relevant details there:

https://forum.pokt.network/t/pep-35-the-v0-optimization-leanpocket/3042/33

POKT-Discourse commented 1 year ago

This issue has been mentioned on Pocket Network Forum. There might be relevant details there:

https://forum.pokt.network/t/preproposal-poktfund-2022-leanpokt-and-security-vulnerability-reimbursement/3933/1

POKT-Discourse commented 1 year ago

This issue has been mentioned on Pocket Network Forum. There might be relevant details there:

https://forum.pokt.network/t/preproposal-poktfund-2022-leanpokt-and-security-vulnerability-reimbursement/3933/4

POKT-Discourse commented 1 year ago

This issue has been mentioned on Pocket Network Forum. There might be relevant details there:

https://forum.pokt.network/t/thunderhead-poktfund-leanpokt-proposal-reimbursement/4069/1

Olshansk commented 1 year ago

DOne a long-long time ago

pokt-network / pocket-core

LeanPOKT Proposal & Design Specification #1437

Name: ThunderStake (Pierre Spiegel & Addison Spiegel) | BaaS Pools LLC