spacemeshos / SMIPS

Spacemesh Improvement Proposals
https://spacemesh.io
Creative Commons Zero v1.0 Universal
7 stars 1 forks source link

P2P blacklisting #56

Open antonlerner opened 3 years ago

antonlerner commented 3 years ago

Overview

In a permissionless decentralised network some peers might misbehave and not follow the rules of the protocol, this can happen due to benign reasons such as network latencies, slow hardware and disconnections from the network but also can happen deliberately by malicious nodes that aim to attack or prevent other nodes from being in consensus with the rest of the network. In this document we will describe the different misbehaviours and deviations from protocol that a miner might present and offer ways to mitigate them or block malicious miners from affecting other honest miners.

A node might misbehave in several different ways:

Remove locally / Ban from network

For each misbehaviour we must specify if any action is required, whether it be removing the node from the local peers list, or ban it from the network and alerting other nodes about the misbehaviour.

Locally removing peers

Nodes with slow internet connection can be removed to improve performance of sync. the removing of nodes due to slow connection does not mean they will be banned from network, or ever banned from cennecting to the same node somtime in the future.

Local banning of nodes

Permanently blacklisting a peers id in order to never connect to it again, this is not gossiped and is local for a node.

Banning

Banning is disallowing the node to participate in the network consensus and notifying all other parties of this nodes malicious behaviour. banning can also be accompanied with discarding the banned nodes blocks and or ATXs from the mesh. In case of banning, we need to decide whether the banning is for good, or bound by some expiration data. Also, in order to alert other nodes, we must produce a banning broadcast message with some proof of nodes misbehaviour.

Identification of dishonest behaviour:

Here we will describe the different effects malicious network behaviour

P2P

The P2P layer is the base for all communications between nodes malicious or dishonest behaviour could be:

Identification and handling: A nodes response time to sync requests can be measured, non responsive nodes or late ones can be removed from requesting nodes neighbours list. No need to report malicious activity, however, we must prevent, at least for a certain amount of time the reconnection of this node.

POET

There are several cases for misbehaviour, the first is of other nodes that try to contact the PoET server. In this case nodes can present the following behaviours:

  1. multiple registration requests
  2. sending malformed or incorrect messages to PoET
  3. Spamming poet server to cause DoS

Since these behaviours are very dangerous and could potentially cause PoET not to function, we must ban the malicious peers whenever they present such behaviour.

There can be misbehaviour on the POETs behalf, in this case there needs to be proof of poet misbehaviour 2 signed poet roots may be proof of misbehaviour if a poet doesn't return anything, we request from another poet.

POETs can share IP blacklists.

Protocol misbehaviour: A node is considered as malicious if it generates some message that can prove it's maliciousness A blacklisted node needs to include at least one proof for its misbehaviour.

For a misbehaving node, whenever proved its weight is reduced to zero also for historical votes. sometimes running the tortoise will be necessary, in order to decide wether to do so, we must calculate the total weight cancelled and check if it affects the vote margin (if it goes above the threshold) then tortoise must be run from genesis.

Banned node ATXs still need to be persisted. Voting blocks for banned miners can be pruned - unless they are used as reference blocks.
Banned nodes might still receive rewards

Blocks

Nodes that produce syntactically invalid blocks must be banned and added to node blacklist. The bad blocks can be proof of misbehaviour. Sending late blocks may be the cause of network latencies, so banning shouldn't be the first option
Eclipse detection? Peers who send malformed blocks should be blacklisted.

ATXs

Syntacticly invalid ATXs should cause the producing node to be banned as well. A syntactically invalid ATX can be still propagated via gossip to provide proof of misbehaviour. Late ATXs should be considered as valid.

Sync misbehaviour

Sync can be an effective way to DoS a miner, causing it to perform many requests to non existing data. these are only some of the malicious behaviours a peer might present to syncing nodes

  1. sending wrong layer hash without blocks that can be verified with the hash
  2. Sending wrong aggregated layer hash
  3. Sending less / more blocks that do not match layer hash or that are not found These are considered adversarial behaviours, the node must be banned. Q: what could be the proof for misbehaviour in this case?

There could be more benign situations where a node simply does not respond in time, in this case, as in P2P lag we can remove the specific peers from peers list and not report it as malicious.

Hare misbehaviour

  1. Node that does not play along rules of consensus (sends wrong round messages in hare) in this case, the node needs to be ignored In case of malicious or malformed messages the nodes needs to be removed.

High-level design

Blacklisting

Blacklisted nodes need to be persisted. Do we ignore nodes from blacklisted nodes?

Proposed implementation

Implementation plan

Questions

Dependencies and interactions

Stakeholders and reviewers

Testing and performance

lrettig commented 3 years ago

What do other protocols/implementations do? It would be good to understand this. There's likely existing code or libraries we can take advantage of.

Peer selection is not black-and-white. There are definitely scenarios where a peer should be blacklisted/banned, but in many cases, it might make more sense to assign scores to peers and prioritize picking/staying connected to peers with high scores. We may also want to have scores decay or mean-revert over time.

We may want to expand the scope of this SMIP to talk about peer selection and management more generally: e.g., how often should the peerset be refreshed? What logic is used to do this?