holepunch: low IPv6 success rate

dennis-tra commented 1 year ago

From the data of our hole punching measurement campaign we can deduce some information about which IP version + Transport is more successful than others. The following graph shows the data:

Each network in the above graphs contributed more than 1k hole punches, so should be statistically significant (without having a notion of statistical significance, it just feels enough). The average success rate of TCP and QUIC in both case (IPv4 and IPv6) is actually the same - which is the first thing that is unexpected. However, in the case of IPv4, QUIC has less variance.

The more interesting/concerning part is the IPv6 success rate. There could be something wrong with the measurement, or something that makes hole punching over IPv6 inherently challenging, or indeed an issue somewhere in the libp2p stack.

A note on the data from the graph above. E.g., in the case of IPv4/TCP, both peers were only using each other's IPv4/TCP addresses - even if either client had for example another IPv4/QUIC address. The latter multi-address was not used in this case.

cc @mxinden

vyzo commented 1 year ago

Possibly not having observed public IPv6 addrs to exchange, which breaks everything -- ref also the other discussion we are having in go-libp2p.

mxinden commented 1 year ago

Possibly not having observed public IPv6 addrs to exchange, which breaks everything

Extending this. Nodes need to discover their public IP address via identify and validate them via AutoNAT. Those addresses are then sent to the hole punching remote via DCUtR.

@dennis-tra is there any way you can validate that the Go punchr clients connect to a bootstrap node over IPv6 + TCP & QUIC and thus discover their public address?

dennis-tra commented 1 year ago

Possibly not having observed public IPv6 addrs to exchange

In the above graphs the clients reported to listen on an IPv6 address. If there was no address to exchange I would have tracked that.

@dennis-tra is there any way you can validate that the Go punchr clients connect to a bootstrap node over IPv6 + TCP & QUIC and thus discover their public address?

I probably can't do that. But if a client reports listening on an IPv6 address I would have assumed that they support IPv6 as it has received ~3 confirmations via identify for that address.

sukunrt commented 1 year ago

@dennis-tra I'm trying to understand the exact measurement that's being made here.

Correct me wherever my understanding is wrong.

honeypot dht crawler crawls the dht and whenever it connects to a peer which reports having relay addresses(based on identify response) and supports dcutr it's added to a db.
The go punchr client asks a server for peers to holepunch. The server looks at this db and responds with an appropriate peer and a protocol filter to apply. Say(TCP+IPV6)
the go punchr client opens a relay stream to this peer in response to which the peer initiates a holepunch.

In this setup, how do you know that the peer(advertising relay addresses presumably behind a NAT) obtained by the crawler is listening on TCP+IPV6?

@mxinden how does the rust client handle protocol filtering?

sukunrt commented 1 year ago

My theory here is. Say we are measuring TCP+IP6(tcp6) success rate.

The peer that was found by honeypot had a relay address. So its reachability was private and it'd not be possible to determine whether this peer had any tcp6 listen addresses.
Now the server when queried for a peer to holepunch by a punchr client receives this peers info with a protocol filter of tcp6.
The go punchr client opens a relay connection to this peer. This peer then opens a dcutr stream over the relay connection.
Suppose this peer doesn't support tcp6, it only supports tcp4. It sends its tcp4 address in the connect message.
Now the go punchr client applies the protocol filter in two steps.
- First apply the filter to its observed addresses. This list supports tcp6. So its address list is tcp6
- Then apply the filter to the remote peers addresses provided in the connect message. Its list doesn't contain any address matching tcp6 so we don't do any filtering. It keeps the peers addresses as it is. (tcp4)
  - This filtering logic is here it is only applied if there are some matching addresses.
  - It is applied separately to clients observed addresses here and to remote addresses provided in the connect message here
  - For correct behaviour we should apply the filter to both address or to neither.
Here we have a mismatch. punchr client will provide tcp6 address to the peer and will dial on tcp4 address.

dennis-tra commented 1 year ago

honeypot dht crawler crawls the dht and whenever it connects to a peer which reports having relay addresses(based on identify response) and supports dcutr it's added to a db.

That's correct 👍 you have probably found this already but just for completeness: the relevant logic is here

The go punchr client asks a server for peers to holepunch. The server looks at this db and responds with an appropriate peer and a protocol filter to apply. Say(TCP+IPV6)

Also correct 👍

the go punchr client opens a relay stream to this peer in response to which the peer initiates a holepunch.

Also correct 👍

In this setup, how do you know that the peer(advertising relay addresses presumably behind a NAT) obtained by the crawler is listening on TCP+IPV6?

The CONNECT message we receive from the remote peer contains all of its observed non-relay Multiaddresses.

We track these addresses here. It's correct that the server cannot know at the time of serving this peer to the punchr client if it's really listening on e.g., TCP+IPv6. That's why we don't apply the protocol filter if no address would be left. In the analysis (and in the graph), I looked at the reported Multiaddresses from the remote peer (from its CONNECT message) and the given protocol filter and only consider those data points (hole punch results) where at least one Multiaddress from the remote peer would be left.

Regarding your theory:

The peer that was found by honeypot had a relay address. So its reachability was private and it'd not be possible to determine whether this peer had any tcp6 listen addresses.

With "it'd not be possible to determine whether this peer had any tcp6 listen addresses" I assume you mean from the punchr servers perspective? Or do you mean that it's also not possible for the peer to determine whether it had any TCP/IPv6 listen addresses? Regarding the latter, I'd say this isn't correct.

Here we have a mismatch. punchr client will provide tcp6 address to the peer and will dial on tcp4 address.

You're remarks are correct. However, I account for this in the data analysis step. As mentioned above, I only consider those data points where the protocol filter applied to the remote peers set of Multiaddresses would yield at least one Multiaddress. If the the protocol filter would has filtered out all addresses I would have not considered this hole punch in this particular analysis.

bt90 commented 1 year ago

The CONNECT message we receive from the remote peer contains all of its observed non-relay Multiaddresses.

Is it guaranteed that these are publicly dialable? e.g excluding link-local (fe80::/10) and ULA(fc00::/7)

libp2p / go-libp2p

holepunch: low IPv6 success rate #2068