mimblewimble / grin

Minimal implementation of the Mimblewimble protocol.
https://grin.mw/
Apache License 2.0
5.04k stars 990 forks source link

Peer connection drop #2726

Open hashmap opened 5 years ago

hashmap commented 5 years ago

It's not an issue per se, it was a design decision, however it may be worth to discuss one more time.

Grin node periodically kills the excess of connections https://github.com/mimblewimble/grin/blob/master/p2p/src/peers.rs#L463 which is understandable but annoying for counterparts. From my experience a node connects and almost immediately get dropped. There are couple ideas how to improve it:

Do we have some other ideas?

JeremyRubin commented 5 years ago

@ethanheilman is probably a good person to look at this.

DavidBurkett commented 5 years ago

Yes, it'd be great to get an expert opinion on this. My thoughts are:

I put some thought into it, and the best approach I can come up with to solve our current peer health problems, without sacrificing network security, would be to: 1) Have separate incoming and outgoing connection quotas. 2) If a node has filled its quota of incoming connections, any new connection requests should be responded to with a successful handshake (optional), a message with a list of peers (crucial on seed nodes for new peers who are trying to bootstrap), and then immediately dropped. The only exception would be if the peer has a higher chain height/difficulty, we would want to allow them to connect to make sure we're not being eclipse and/or sybil attacked. 3) To combat the remote possibility of you and your peers becoming isolated from the main network, occasionally drop a random incoming connection or two when your incoming quota is full. Do the same with your outgoing connection pool, and immediately try to connect to another peer (preferably from a different subnet and/or geographical region).

Obviously, this doesn't have to all be implemented at once, but I'd prefer to work toward a system similar to this. I'd love to hear everyone's thoughts on this.

garyyu commented 5 years ago

@hashmap

From my experience a node connects and almost immediately get dropped.

When dropping a connection because of excessing, does/could the remote peer try connecting again and again? I suppose yes. Perhaps we should duplicate the excess logic on the connection request part, instead of current accept / drop / accept / drop / ...

Send list of peers before dropping a peer #2725 (however some peers don't need it, is it ok to spam?)

IIRC, Peer request (send_peer_request(p2p::Capabilities::PEER_LIST)) should only happen when it's needed.

How about enhance the "Ping / Pong" ( or "Hand / Shake" ? I don't remember well) message? and clearly present the "Peer Request" on the "Hand" (for example) message if needed, means if the remote peer is full of connections, it should at least send back a peer list before refusing/dropping the connection request.

FIFO dropping - #2724 to support it (could be gameable)

Does that mean the new connections/nodes will be more unstable? ( new connection has bigger chance to be dropped)

hashmap commented 5 years ago

@garyyu

does/could the remote peer try connecting again and again?

Only if it doesn't know any other peers, we have a queue of peers to connect, so we proceed with the next in this queue. So I don't think we should address it connection request code.

IIRC, Peer request (send_peer_request(p2p::Capabilities::PEER_LIST)) should only happen when it's needed.

Yeah, here is a problem that we don't know anything about this peer and this peer may not have time to request a peer list

How about enhance the "Ping / Pong" ( or "Hand / Shake" ? I don't remember well) message?

I'm not sure, not a big fan of a such design

if the remote peer is full of connections, it should at least send back a peer list before refusing/dropping the connection request.

that's what #2725 introduces

Does that mean the new connections/nodes will be more unstable? ( new connection has bigger chance to be dropped)

Vice versa, older connections would be dropped first (first in - first out)

DavidBurkett commented 5 years ago

Vice versa, older connections would be dropped first (first in - first out)

Just want to reiterate that FIFO doesn't solve any of the issues that we're having right now.

antiochp commented 5 years ago

FIFO would work against any kind of network stability right? We'd drop older long-lived connection over newer connections.

antiochp commented 5 years ago

Ideally we'd want (I think) a mix of old and new connections. And seeds that "just seeded".

But we went with the simplest approach - all nodes are equal, all connections are equal.

Ideally we'd keep this as simple as possible (and not too simple) without introducing too much differentiation/precedence/complexity, while at the same time maintaining a healthy network.

DavidBurkett commented 5 years ago

@antiochp Yes, FIFO would make the least stable network. I think my proposal[1] handles most of those concerns, while providing a very stable network (minimal peer dropping). Do you disagree?

[1] https://github.com/mimblewimble/grin/issues/2726#issuecomment-479568264

garyyu commented 5 years ago

Vice versa, older connections would be dropped first (first in - first out)

Long term connection peers (normally that means more stable peer) will be dropped first? always favor new connections? I think current random dropping is much more fair :-) and simple.

0xmichalis commented 5 years ago

My node log is full of the following errors and I wanted to confirm it's related to what is being discussed in this issue. If not, I can open a separate one.

https://pastebin.com/31q5426E

DavidBurkett commented 5 years ago

@kargakis Yes, that's the same issue.

mcdallas commented 5 years ago

Do we have some other ideas?

In Bitcoin you keep a banscore for each of your peers which you increase if they do stuff like:

etc. If the peer reaches a threshold you ban him. When you need to evict a node (i.e when you are over capacity and receive a connection) you evict the peer with the highest banscore first

DavidBurkett commented 5 years ago

@mcdallas I like that idea for long-term, but something like that would require a lot of work and even more testing. If done incorrectly, we could end up kicking slow peers off the network, or peers with older versions, etc.

JeremyRubin commented 5 years ago

The banscore stuff in Bitcoin sucks and isn't actually really used -- if Bitcoin Core could get rid of it easily, we would. In general it's a binary thing at this point, modulo one or two use cases. Cases where banscore was incremented are being removed (e.g., no banning for any PoW valid block served). I don't think we want to emulate that here.

What is good (besides confusing terminology) is the notion of outbound vs inbound peers. I think thinking about this in terms of stable marriage terminology helps more: there are nodes that are suitors and there are nodes that you would like to be a suitor too (the node being suited can be called the 'wooed').

You place much lower value on your suitors than on the nodes that you want to woo. You should also be suiting those nodes, irrespective of the number of suitors you have. Otherwise it's easy to sybil your node.

The nodes that your suiting should either be ones that you configure, ones you've been connected to for a long time without fault, or that have provided you with high value information frequently (e.g., they propagate blocks to you you hadn't seen). Thus you can upgrade a suitor to a 'match' and add them to your woo-list if you're getting high quality information from them.

From your woo-list, you should select a fixed number (maybe let's say 8) to peer with. You can rotate these around. If a node on your woo-list connects to you as a suitor, you will let them stay on.

From your suitor list, you can treat that as a FIFO or whatever. You can also limit the bandwidth from those nodes more heavily as they are less important to you. Perhaps put heavier delays on propagating them blocks or transactions, or don't consider them for dandelion relay at all (probably a good idea!).

Treating all nodes equally sounds nice -- but in practice it makes the network easy to attack. Known good peers are very valuable! Something like the above preserves that structure.

I also don't think we should expose the peer list necessarily. Maybe expose your suitors but not your woo-list. It seems a bit sensitive given that this helps someone map the network.

@EthanHeilman should eventually chime in here (I'll ping him) about addressing Eclipse attacks under this -- pinging random IP addresses helps iirc.

0xmichalis commented 5 years ago
  • Have separate incoming and outgoing connection quotas.

  • If a node has filled its quota of incoming connections, any new connection requests should be responded to with a successful handshake (optional), a message with a list of peers (crucial on seed nodes for new peers who are trying to bootstrap), and then immediately dropped. The only exception would be if the peer has a higher chain height/difficulty, we would want to allow them to connect to make sure we're not being eclipse and/or sybil attacked.

  • To combat the remote possibility of you and your peers becoming isolated from the main network, occasionally drop a random incoming connection or two when your incoming quota is full. Do the same with your outgoing connection pool, and immediately try to connect to another peer (preferably from a different subnet and/or geographical region).

@DavidBurkett do you think it's necessary for non-seeder nodes to share their peer list? Maybe share a part of it? Or ask an existing peer's list and send that back to the original request instead? Overall, I find your proposal reasonable. :+1:

lehnberg commented 5 years ago

Perhaps put heavier delays on propagating them blocks or transactions, or don't consider them for dandelion relay at all (probably a good idea!).

Yes, this is in line with the recommendations of the Dandelion++ paper. Section 4.3 explicitly touches on this topic and outlines an attack where an adversary is lying about how many outbound edges they create and could even connect to all honest nodes on the network this way. They mitigate against this by forcing a node Alice to only forward transactions via its outbound edges, ie to nodes Alice has chosen, rather than to nodes that have chosen Alice:

Creating many edges. Dandelion is naturally robust to nodes that create a disproportionate number of edges, because spies can only create outbound edges to honest nodes. This matters because in the stem phase, honest nodes only forward messages on outbound edges.

Inbound edges, i.e. from suitors Outbound edges, i.e. to the woed

ignopeverell commented 5 years ago

We already keep track of inbound and outbound peers and have a fixed preferred quota of outbounds. This seems like a good thing. Beyond this, I think we're going to have a hard time finding a solution that's not gameable and helps significantly.

Question is: are we trying to bend over backward to compensate for a lack of open ports on the network?

hashmap commented 5 years ago

@ignopeverell lack of open slot could explain that but 2 facts are worth further check. I rarely see peers connected for more than 30 secs. There is a trend here (peeks are restarts)

https://grin.report/d/jeTROiqmz/peers?orgId=1&fullscreen&panelId=2&from=now-30d&to=now&refresh=1m

EthanHeilman commented 5 years ago

Thanks @JeremyRubin for letting me know about this thread.

I am ignorant of how Grin currently manages network connections, but I can provide a high level explanation of how Bitcoin does it which may be helpful. In Bitcoin connections are divided into two groups: incoming connections and outgoing connections.

Currently the default configuration for a Bitcoin node is:

  1. 125 connections total i.e. incoming connections + outgoing connections <=125
  2. At most 9 outgoing connections, however one of these connections is reserved for special tasks so for all intents and purposes 8 outgoing connections
  3. At most 116 incoming connections

The reason that there are so many more slots reserved for incoming connections is that not every client can accept incoming connections (clients behind a NAT, firewall etc).

We wrote a paper examining Bitcoin's P2P network and how gamable it was in 2015-2016. This work includes a range of countermeasures to make cryptocurrency P2P networks more robust. I have a project page with a link to paper and the PR's we merged into Bitcoin to strengthen BItcoin against these attacks.

I'm happy to discuss this further if anyone is interested but it would be very helpful to have a short summary of how Grin currently manages its P2P network.

ignopeverell commented 5 years ago

Thank you very much for passing by @EthanHeilman. Currently Grin does the following:

  1. 25 connections max total. This appears to be much too low (would you concur?) and currently can make connectivity difficult at times, for the reasons you outlined around availability of incoming connections.
  2. At least 8 peers total...
  3. ...except if we have less than 4 outgoing connections, in which case we try to establish more to get to 4.

Ii looks like our main gap when compared to bitcoin is the low number of connections we allow overall, and the absence of restrictions on outgoing connections.

EthanHeilman commented 5 years ago

Thanks for providing some insight into how Grin works.

25 connections max total. This appears to be much too low (would you concur?)

Personally I think 25 connections max is probably not enough, although I believe ETH does get by with 25 connections. Increasing this number will probably get you some breathing room with regard to incoming connection slot exhaustion.

4 outgoing connections, in which case we try to establish more to get to 4.

Bitcoin always tries to use all outgoing connection slots so BTC will always attempt to make an outgoing connection until they have 8 outgoing connections. I'm not sure what the magic number here is although 8 seems to work.

Increasing outgoing connections Pros:

  1. The more outgoing connections the more connected the network, the lower the probability the network partitions itself by accident, this also leads to faster propagation because fewer hops between nodes.
  2. The more outgoing connections the harder an eclipse attack becomes because an eclipse attacker needs to monopolize all the outgoing connections.

Increasing outgoing connections Cons:

  1. Uses more resources, takes up more incoming connection slots.

I'm not sure it is still the case but blockchain surveillance/analysis companies used to attempt to connect to every node which allowed incoming connection slots in Bitcoin for the purposes of learning when a transaction was broadcasted and who broadcasted it. This doesn't work so well as a surveillance method in modern Bitcoin, so I'm not sure if they still do it. This behavior takes up 1 incoming connection slot on each node which if the number of incoming connection slots were low would be very problematic.

@ignopeverell How does Grin do peer seeding? Are you using DNS seeders?

lehnberg commented 5 years ago

See the Erlay paper for additional discussion on how Bitcoin network connectivity works today: https://arxiv.org/abs/1905.10518