libp2p / go-libp2p

libp2p implementation in Go
MIT License
5.98k stars 1.05k forks source link

Failure to meet activation threshold for observed address without DHT client #2941

Open 2color opened 2 weeks ago

2color commented 2 weeks ago

Preface

This is not a bug per se, but an issue that arises in specific circumstances.

TL;DR:

Background

ipfs-check, is a retrievability diagnostics tool. You provide it with a CID and a multiaddr and it checks —amongst other things— whether the peer is dialable and whether it has the block of the CID. This is especially useful to determine whether a peer behind NAT is actually "hole-punchable", which you can only know by trying to connect to.

The tool creates a short-lived libp2p test host for each check, which connects to one or two peers (two if it needs circuit relay reservation if the other peer is behind NAT), however hole punching will never kick off and it will hang on the following line:

2024-08-29T13:40:04.468+0200    DEBUG   p2p-holepunch   holepunch/svc.go:98 waiting until we have at least one public addresspeer12D....`

I was able to work around this by adding a DHT client to the test host, however, it's not clear whether this is ideal. There are some situations (like resource constrained environments) where you don't want the overhead of increased connections and bandwidth associated with a DHT clients.

Ideas

MarcoPolo commented 2 weeks ago

By corollary, as the number of transports and IPs increases, more connections are necessary to activate a given addr.

To be clear "transports" in this case only refers to TCP vs UDP. We only compare the "thin waist" parts of the address which are the ipv{4,6} + port + {TCP,UDP}.

We "activate" an observed address once 4 peers return tell us they've observed an address. This reduces the trust assumptions on any particular peer and provides some level of security

I think that was the original intent, but I don't think this actually gives us that much security. It's pretty easy to get 4 different IP addresses. It does probably cut down on the noise we would see from our observed addresses by filtering only to those that have gotten more than 4 observations.