status-im / nimbus-eth1

Nimbus: an Ethereum Execution Client for Resource-Restricted Devices
https://status-im.github.io/nimbus-eth1/
Apache License 2.0
562 stars 106 forks source link

Getting "external IP" once then using it is unsuitable for long-running p2p #712

Open jlokier opened 3 years ago

jlokier commented 3 years ago

Devices on home networks do not generally have a fixed "external IP" that lasts for weeks. Only a minority of home networks have a static external IP. These days, not only do most home networks have a dynamic external IP that changes (stability timescales vary from hours to months), there is increasingly likely to be a CGNAT layer in some places, so the external IP is shared among multiple households.

(Larger workplace external IPs tend to be more stable, but they have different issues. They are more likely to be a pool of external IPs rather than a single one, so the concept of a single external IP is flawed, and the protocols for port mapping that work on some home routers are less likely to be permitted in a workplace, so NAT hole punching techniques have to be used instead. But "generalised p2p" to all peers across multiple NAT and firewall boundaries is a harder problem, and potentially beyond the capability of Ethereum Devp2p. This issue focuses only on the problem where "external IP" works but it changes over time.)

The logic at https://github.com/status-im/nimbus-eth1/blob/6d4205b0b07b35ab9c76ab2d14733c67127aa042/nimbus/nimbus.nim#L99 which calls getExternalIP once and then tries to open a port redirection (with UPNP or NAT-PMP) is flawed on such networks for long-running operation.

Nimbus will be able to sync as a peer for a while, but eventually incoming connections will stop working. Another problem with this approach is when a home router is restarted (some seem to get stuck and need this occasionally), or restarts itself nightly. Then the UPNP/NAT-PMP port redirection is typically lost, and must be recreated.

It is not the end of the world if we cannot receive incoming connections. On many networks we cannot anyway, and we rely on outgoing connections only for the p2p overlay. But if we can receive them we should continue to do so when Nimbus Eth1 is kept running for a long time.

Solution: Like any long-running p2p application, Nimbus should probe the network from time to time to check if port mappings and advertised IPs need to be updated. Alternatively it could restart itself every so often, and this may be the pragmatic solution.

At the moment this issue doesn't matter because Nimbus Eth1 is not ready for long-term use. And when we use Nimbus Eth1 on servers for testing and integration with Eth2 it won't matter either. But for long-running usage by end-users, this issue should be addressed.

stefantalpalaru commented 3 years ago

Nimbus should probe the network from time to time to check if port mappings and advertised IPs need to be updated.

We're already doing this for port mapping: https://github.com/status-im/nim-eth/blob/601fa7ff667431b05d18579af0e43bf4d8dafa61/eth/net/nat.nim#L182

But there was no way to update the external IP in libp2p, back then.

kdeme commented 3 years ago

Discv5 has the option the get the new IP:Port combination from information provided by its peers. Doesn't help for missing port mapping of course.

For those set-ups where upnp/npmp is enabled/supported, it would help to also add the update of IP through upnp/npmp (mostly useful for discv4 in this case I'd say).

jlokier commented 3 years ago

Once you're getting IP:port from peers, sometimes there's the scenario where different peers report a different IP. Some of them because they are inside the same NAT zone (e.g. inside the same ISP), sometimes because NAT is using an external IP pool rather than a single IP.

Doesn't help for missing port mapping of course.

Sometimes it would help, because that's one of the approaches to NAT hole punching, depending on the flavour of NAT. With some flavours, if a peer reports a particular IP:port, other peers will be able to connect to that same IP:port.

I'm not sure if we care about NAT hole punching techniques, or is the idea to leave that sort of thing entirely to libp2p to solve?