kwilteam / kwil-db

Kwil DB, the database for web3
https://www.kwil.com/
Other
32 stars 10 forks source link

peer filtering and private network support #924

Closed jchappelow closed 2 weeks ago

jchappelow commented 3 weeks ago

A more native way of constructing a private network in which nodes are not promiscuous (do not accept p2p traffic from any incoming host) is desirable compared to a managed VPC or a self-hosted VPN.

Currently @charithabandi is implementing peer filtering with ABCI's application query support. This issue is to track the feature and discuss any decision points that may arise.


The first thing that comes to mind is how the whitelist is constructed. A combination of:

  1. all current validators
  2. any manually allowed nodes, such as the sentry nodes that serve as RPC providers for a network
  3. other tricks or shortcuts?

An consideration in the above is handling of (a) validator joins, and (b) validator removes/leaves. I propose the following for validator joins, but this is up for discussion:

As a reminder, the current tooling for joins involves kwil-admin commanding the prospective validator node to author+sign+broadcast the join request transaction. This means that the node needs to have peers on the network for it to broadcast. A previous instance of this had kwil-admin author+sign the transaction with access to the keys directly, and it would broadcast to any RPC provider. Middle ground? I can see a mode in the current design where if the prospective validator node is not on the network and thus cannot broadcast, that it could return/print the raw bytes of the transaction, which could then be broadcast to any RPC provider on the network. This would simply requite adding a utils broadcast command to either kwil-admin or kwil-cli (or both), which is actually a baseline functionality of virtually every blockchain's tooling.

However, I think that any mode that allows adding a validator before that node is actually connected and synchronized to the network will have liveness implications. During the period after they become a validator but before they reach 100% sync, the network will be hitting the full precommit_timeout (or prevote_timeout, not sure) as it waits for every validator's signature in the consensus rounds. It won't stop blocks, but it will slow them down. Looking further out, if we introduce more complex validator economics like staking rewards or slashing, an unsynchronized validator will not be desirable

brennanjl commented 3 weeks ago

The first thing that comes to mind is how the whitelist is constructed. A combination of:

  1. all current validators
  2. any manually allowed nodes, such as the sentry nodes that serve as RPC providers for a network
  3. other tricks or shortcuts?

This is what I had in mind.

  • key X wants to become a validator
  • existing validators that intend to approve a join request add that future node's ID to the peer whitelist, just as would be done for an RPC provider sentry node
  • new node with key X syncs as sentry node
  • node X issues a join request tx
  • validators approve, one at a time
  • at threshold, node X becomes validator and votes on the very next block with no liveness implications

This also sounds about right to me, except it seems like only 1 current validator would need to add them to their whitelist. Having more than 1 validator allow connection would ensure it continues syncing smoothly (and isn't subject to that one validator going down). @charithabandi, how hard would it be to have validators:

  1. automatically allow a node to become a peer (by adding them to its whitelist) when they approve a validator join request
  2. automatically remove a node from its whitelist if its validator join request fails.
brennanjl commented 3 weeks ago

As a reminder, the current tooling for joins involves kwil-admin commanding the prospective validator node to author+sign+broadcast the join request transaction. This means that the node needs to have peers on the network for it to broadcast. A previous instance of this had kwil-admin author+sign the transaction with access to the keys directly, and it would broadcast to any RPC provider. Middle ground? I can see a mode in the current design where if the prospective validator node is not on the network and thus cannot broadcast, that it could return/print the raw bytes of the transaction, which could then be broadcast to any RPC provider on the network. This would simply requite adding a utils broadcast command to either kwil-admin or kwil-cli (or both), which is actually a baseline functionality of virtually every blockchain's tooling.

To make sure I understand correctly, you are essentially looking for a way to broadcast a validator join request if a node is not connected to any validator (does not exist in any of their whitelists)?

jchappelow commented 3 weeks ago

To make sure I understand correctly, you are essentially looking for a way to broadcast a validator join request if a node is not connected to any validator (does not exist in any of their whitelists)?

Yes, but more accurately, if a node is not connected to any other node on the network including sentry nodes.

For example, the prospective validator node could start up, have zero peers (they aren't on any whitelist), but they could still use kwil-admin to generate a join request transaction, which they could then broadcast via a public RPC provider as a way of sneaking their join request on to the network before having p2p connectivity.

This is still problematic for the reason I describe in my last paragraph, which is that if they do ultimately get made into a validator before their node is synchronized, they are immediately missing their block proposals and block approvals, thus holding up the network.

It sounds like @charithabandi is solving the issue by requiring at least one node to whitelist the new node so that it can sync and broadcast its join request, then as you've suggested validators will automatically whitelist that node as long as they are voted to approve.