Open antonlerner opened 3 years ago
What do other protocols/implementations do? It would be good to understand this. There's likely existing code or libraries we can take advantage of.
Peer selection is not black-and-white. There are definitely scenarios where a peer should be blacklisted/banned, but in many cases, it might make more sense to assign scores to peers and prioritize picking/staying connected to peers with high scores. We may also want to have scores decay or mean-revert over time.
We may want to expand the scope of this SMIP to talk about peer selection and management more generally: e.g., how often should the peerset be refreshed? What logic is used to do this?
Overview
In a permissionless decentralised network some peers might misbehave and not follow the rules of the protocol, this can happen due to benign reasons such as network latencies, slow hardware and disconnections from the network but also can happen deliberately by malicious nodes that aim to attack or prevent other nodes from being in consensus with the rest of the network. In this document we will describe the different misbehaviours and deviations from protocol that a miner might present and offer ways to mitigate them or block malicious miners from affecting other honest miners.
A node might misbehave in several different ways:
Remove locally / Ban from network
For each misbehaviour we must specify if any action is required, whether it be removing the node from the local peers list, or ban it from the network and alerting other nodes about the misbehaviour.
Locally removing peers
Nodes with slow internet connection can be removed to improve performance of sync. the removing of nodes due to slow connection does not mean they will be banned from network, or ever banned from cennecting to the same node somtime in the future.
Local banning of nodes
Permanently blacklisting a peers id in order to never connect to it again, this is not gossiped and is local for a node.
Banning
Banning is disallowing the node to participate in the network consensus and notifying all other parties of this nodes malicious behaviour. banning can also be accompanied with discarding the banned nodes blocks and or ATXs from the mesh. In case of banning, we need to decide whether the banning is for good, or bound by some expiration data. Also, in order to alert other nodes, we must produce a banning broadcast message with some proof of nodes misbehaviour.
Identification of dishonest behaviour:
Here we will describe the different effects malicious network behaviour
P2P
The P2P layer is the base for all communications between nodes malicious or dishonest behaviour could be:
Identification and handling: A nodes response time to sync requests can be measured, non responsive nodes or late ones can be removed from requesting nodes neighbours list. No need to report malicious activity, however, we must prevent, at least for a certain amount of time the reconnection of this node.
POET
There are several cases for misbehaviour, the first is of other nodes that try to contact the PoET server. In this case nodes can present the following behaviours:
Since these behaviours are very dangerous and could potentially cause PoET not to function, we must ban the malicious peers whenever they present such behaviour.
There can be misbehaviour on the POETs behalf, in this case there needs to be proof of poet misbehaviour 2 signed poet roots may be proof of misbehaviour if a poet doesn't return anything, we request from another poet.
POETs can share IP blacklists.
Protocol misbehaviour: A node is considered as malicious if it generates some message that can prove it's maliciousness A blacklisted node needs to include at least one proof for its misbehaviour.
For a misbehaving node, whenever proved its weight is reduced to zero also for historical votes. sometimes running the tortoise will be necessary, in order to decide wether to do so, we must calculate the total weight cancelled and check if it affects the vote margin (if it goes above the threshold) then tortoise must be run from genesis.
Banned node ATXs still need to be persisted. Voting blocks for banned miners can be pruned - unless they are used as reference blocks.
Banned nodes might still receive rewards
Blocks
Nodes that produce syntactically invalid blocks must be banned and added to node blacklist. The bad blocks can be proof of misbehaviour. Sending late blocks may be the cause of network latencies, so banning shouldn't be the first option
Eclipse detection? Peers who send malformed blocks should be blacklisted.
ATXs
Syntacticly invalid ATXs should cause the producing node to be banned as well. A syntactically invalid ATX can be still propagated via gossip to provide proof of misbehaviour. Late ATXs should be considered as valid.
Sync misbehaviour
Sync can be an effective way to DoS a miner, causing it to perform many requests to non existing data. these are only some of the malicious behaviours a peer might present to syncing nodes
There could be more benign situations where a node simply does not respond in time, in this case, as in P2P lag we can remove the specific peers from peers list and not report it as malicious.
Hare misbehaviour
High-level design
Blacklisting
Blacklisted nodes need to be persisted. Do we ignore nodes from blacklisted nodes?
Proposed implementation
Implementation plan
Questions
Dependencies and interactions
Stakeholders and reviewers
Testing and performance