Update on libp2p work - Githubissues

whyrusleeping commented 7 years ago

The ipfs network is growing pretty quickly, and we need to move fast to avoid issues caused by ipfs not actually closing any connection. I did some thinking on this this weekend and came up with this issue.

Connection Closing

Currently, ipfs does not close any of the connections it makes, as the network grows larger and larger this is becoming a very big problem. We need to implement a way for ipfs nodes (and more generally, any application using libp2p) to manage the number of connections they maintain. To make a decision on whether or not to close a given connection, we need some information about how valuable it is. For example, peers that are in the DHTs routing table, or ones that are very frequent bitswap partners should be prioritized over connections made just for an infrequent wantlist update or dht query. However, we don't want to go around closing all but a very few 'valuable' connections, It is preferable to keep as many connections open as resource constraints allow. The cost of initiating a new connection is rather expensive latency-wise. It is also useful to maintain a larger number of connections to keep the effectiveness of bitswap wantlist broadcasts high.

To achieve this, We need a system to keep track of connections, sort out which ones should stay open and which ones should be closed. This system should accept hints from upper level callers about the connections. The DHT code should be able to signal that a given connection is in its routing table, bitswap should be able to mark favored partners, pubsub should be able to hold connections to the peers in its swarms and so on.

My proposal is to add this functionality to the 'host' abstraction. A method TagConn(conn, tag, val) should be added, that accepts a connection, a tag to add to the connection, and an importance value for that tag. Also needed is an UntagConn(conn, tag) method. A worker routine would periodically scan through the connections, check if we have more than our limit, and close off connections that have the least assigned value.

Lite Connections

Since it is advantageous to hold open more connections (even if they are infrequently used), I propose we add the concept of a 'lite connection' that doesnt count against the connection limit in the same way as normal connections. These would be low cost connections such as a relayed connection through another peer, a standard connection to a peer in the local area network, or even a BLE connection to some other nearby node. The key here is that these connections are cheaper to maintain. These connections should not be used for any 'high bandwidth' applications, if high bandwidth is needed, a new 'heavy' connection should be created (so as not to abuse relays). The ideal usage of these could be bitswap announcements, small amounts of dht queries, pubsub or pings.

Usage of lite connections

The aim is to maintain fewer 'heavy' connections, and to close out connections periodically as they become less useful to us. For this, nodes should maintain a set of connections to peers that are willing to relay connections for them, and relay multiple connections for other peers. In this way you can be 'connected' to many unique peers per physical 'heavy' connection.

Things to be done

Relay needs to be finished up and integrated https://github.com/libp2p/go-libp2p-circuit
ipfs swarm connect needs a more detailed 'verbose' mode to aid in connection debugging https://github.com/ipfs/go-ipfs/issues/3746
ipfs dht print-table (name TBD) command to print dht routing tables
ipfs swarm gc to manually trigger a cleanup of connections
ipfs swarm limits to view and change swarm limits, perhaps select preset 'modes'
ipfs swarm peers -v should show which peer opened each stream (remote vs local)
connection tagging functionality needs to be added to 'host'
peers listing should show connection tagging information
'host' needs configurable routine to close out connections periodically
NewStream should add hints as context values to select things like 'Dont Dial' or 'Prefer Lite Stream'
Bitswap needs to periodically remove peers from its 'partners' list after inactivity
Think about stream management for pubsub

whyrusleeping commented 7 years ago

A short update, we've been making some really good progress on libp2p lately and I want to recognize some of the hard work thats been done.

@vyzo pushed and got the initial circuit relay code merged into master. You can now relay ipfs libp2p connections through other peers! This allows easier (though still manual) NAT traversal, as well as interesting future work on using fewer connections. This gets us a bit closer to the 'lite connections' ideas above.

@Stebalien and I debugged and fixed a particularly nasty stream multiplexing issue that was being triggered by bitswap. Given some combination of disconnects and reconnects and timeouts, bitswap would write endlessly to a stream that the other side had neglected to close. This has mostly be resolved, but some deeper work is being done to eliminate this class of errors (hopefully) entirely. ref https://github.com/ipfs/go-ipfs/issues/3651

Finally, @magik6k has fixed a bug in the dialing limiter that significantly reduces the number of open file descriptors used when dialing out to peers. With this patch applied, users should notice an improvement in system load and a drop in 'too many open files' errors. This helps mitigate some of the urgency around connection closing, but doesnt address that problem.

whyrusleeping commented 6 years ago

Another exciting update:

We have implemented and shipped connection closing! This new feature will ship in the upcoming 0.4.12 release, but you can try it out now in the 0.4.12 release candidate. The connection manager will be enabled by default, even for nodes with config files from older repos. Users should notice a reduction in memory, cpu and bandwidth usage, as well as a significant reduction in the occurance of 'too many open files' errors.

So, what's next?

Now that we have the tools to defeat NAT, it's time to optimize how we create and accept connections. In order to do this, I think its useful to categorize the different situations that a node on the network might be in.

Nodes with public IP addresses

Nodes who are easily dialable have no need for any sort of NAT traversal, and should have very little if any ambiguity on which addresses might work for connecting to them. For these nodes, we should do the following:

Disable NAT traversal utilities
Announce only 'known' ip addresses
Set higher connection limits
Set up netscan prevention 'local dial' blocks
Disable TCP reuseport
Advertise relay addresses (for javascript nodes to connect)

Nodes behind a traversable NAT

These nodes are generally on home internet connections, they are generally slower connections, and many consumer routers do not enjoy higher numbers of connections (and neither do the roommates of ipfs users). For these nodes, we should:

Advertise external NAT mapping
Advertise relay addresses (to minimize number of connections)
Set lower connection limits

Undialable nodes

These nodes, due to a restrictive NAT or firewall, cannot be connected to externally. For these, we should make sure that we don't encourage other nodes to waste resources trying to connect to them.

Only advertise relay addresses
Run the dht in 'client' mode
Disable upnp (it crashes some routers :/ )
Disable TCP reuseport

Stebalien commented 6 years ago

Disable TCP reuseport

Why?

Nodes behind a traversable NAT

Eventually, we may want to make these nodes DHT clients as well. DHTs with many ephemeral nodes but not a lot of data are really inefficient.

whyrusleeping commented 6 years ago

@Stebalien disable TCP reuseport in some situations simply because it tends to cause issues unexpectedly, and if we don't need it then its probably best not to have it.

Eventually, we may want to make these nodes DHT clients as well. DHTs with many ephemeral nodes but not a lot of data are really inefficient.

Yeah, i would be open to making that the default for them too.

ipfs / kubo

Update on libp2p work #4029

Connection Closing

Lite Connections

Usage of lite connections

Things to be done

Another exciting update:

So, what's next?

Nodes with public IP addresses

Nodes behind a traversable NAT

Undialable nodes