Closed marten-seemann closed 1 year ago
For the record, 3rd option ("DHT-based metrics collection") is being developed at https://github.com/dennis-tra/punchr.
Closing this issue sind @dennis-tra did amazing work on measuring hole punching success rates. This probably gives us enough data to be confident in the impact of our hole punching work.
I’ve opened #504 to track the proposal to introduce a general metrics collection system with desirable privacy properties.
We'd like to have metrics how well hole punching works in practice.
Possible Solutions
Centralized Collection
We cut set up a central (HTTP?) server and have libp2p endpoints report hole punching status reports to that server. Reports would include the timestamp, a cryptographic hash of the peer IDs (so we can deduplicate), and the transport used (IPv4 / IPv6, TCP / QUIC). For privacy reasons, we would not transmit IP addresses. Also for privacy reasons, this reporting would probably have to be opt-in.
Decentralized Collection
To mitigate privacy concerns, endpoints could report the hole punching status to the relay that coordinated the hole punch. As the relay knows both IP addresses and peer IDs (both peers had direct connections with this relay, after all), the privacy impact would be smaller, as the only additional bit of information that the relay learns is if the hole punch was successful (and which transport was used). Therefore, this feature could probably be opt-out. PL (and anyone else interested in hole punching metrics) could then run a relay node and collect these metrics.
The downside is that we'd have to define a new protocol
/libp2p/holepunch-status
, and the NATed peers would have to open a stream using that protocol after the hole punch occurred.DHT-based metrics collection
We could write a scraper that searches the DHT for newly published provider records. It would then try to initiate a hole-punched connection to nodes that advertised a relay address. We can only use relatively fresh provider records, otherwise we'd mistake nodes that were turned off with nodes where the hole punching failed. The advantage of this approach is that it is completely transparent, and doesn't need any upfront implementation work. The downside is that we control the NAT that the scraper node is located behind, and therefore might miss problems that occur when hole punching occurs from nodes located behind other types of NATs.