libp2p / go-libp2p

libp2p implementation in Go
MIT License
5.88k stars 1.04k forks source link

proper support for IPv6 link-local addresses #2659

Open jclab-joseph opened 7 months ago

jclab-joseph commented 7 months ago

Make available p2p in before ip configure.

The problems I found were:

  1. IPv6 advertising incorrectly /ip6zone/br0/ip6/fe80::7369:44d4:4573:9c7f/tcp/12345 When listen to an IP like this, ip6zone is included when advertising the address. This is just for own interface, so don't need to advertise it.

the code below roughly works.

        func(cfg *libp2p.Config) error {
            cfg.AddrsFactory = func(addrs []ma.Multiaddr) []ma.Multiaddr {
                var filtered []ma.Multiaddr
                for _, addr := range addrs {
                    _, err := addr.ValueForProtocol(ma.P_IP6ZONE)
                    if err == nil {
                        tokens := multiaddr.Split(addr)
                        addr = multiaddr.Join(tokens[1:]...)
                    }
                    filtered = append(filtered, addr)
                }

                return filtered
            }
            return nil
        },
  1. Unable to connect via ip6zone.
var dialMatcher = mafmt.And(mafmt.IP, mafmt.Base(ma.P_TCP))

// CanDial returns true if this transport believes it can dial the given
//multiaddr.
func (t *TcpTransport) CanDial(addr ma.Multiaddr) bool {
return dialMatcher.Matches(addr)
}

link-local address cannot start with /ip6/ because an interface must be bound for communication. But it fails with CanDial.

  1. Note It may be a good idea to listen on link-local addresses but not advertise them. You can discover and connect with mdns.
marten-seemann commented 7 months ago

Moving the discussion that's now spread across multiple PRs here.

My current take on this issue (I might be missing something though) is that a node should never advertise /ip6zone addresses in the first place. The /ip6zone component should be removed from the multiaddress before advertising it. This is what my PR #2662 does. The reason for that is that the interface is a local information that's of no use to the peer. All that a peer needs to dial me is the IP address.

@jclab-joseph To make progress here, we'd need a detailed description of 1. how your setup looks like, 2. how you configure your libp2p nodes, 3. what currently happens and 4. what you'd expect to happen instead.

jclab-joseph commented 7 months ago

demo application: https://github.com/jclab-joseph/libp2p-link-local-demo

original: image

PR #2662 applied: image

No this is the exact opposite of my intention. My PR(https://github.com/libp2p/go-libp2p/pull/2660) was to enable Listen for ip6zone correctly to enable p2p communication with link-local address. However, this PR excludes ip6zone.

Sorry, I've seen the code properly now, and I see what you meant. I understand that your intention is to exclude /ip6zone/IFNAME from announce, not to prevent ip6zone from Listen .

https://github.com/libp2p/go-libp2p/pull/2660#issuecomment-1842550883 But, As I said here, link-local address does not need to be announced.

Because on the other peer that received the announcement, Unable to connect to address /ip6/fe80::7369:44d4:4573:9c7f/tcp/33921. Can only connect when know which own-interface connected to the remote PC, such as /ip6zone/eth100/ip6/fe80::7369:44d4:4573:9c7f/tcp/33921. (eth100 is local interface, not remote)


PR #2660 applied:

announce test:) image

connect test: image

My PR results are as above.

It can listen and connect to the link-local address, but it does not announce.

marten-seemann commented 7 months ago

But, As I said here, link-local address does not need to be announced.

I think this is an orthogonal issue. We could maybe optimize things and not advertise the address, but there's very little savings in doing so. We have some logic to filter addresses on the receiver side, depending on where we received them from, and we have some smart dialing logic (i.e. logic to prioritize which addresses we dial first), so the cost for that are minimal as well.

Sorry, I've seen the code properly now, and I see what you meant. I understand that your intention is to exclude /ip6zone/IFNAME from announce, not to prevent ip6zone from Listen .

Yes, exactly. Thanks for checking!

Does that mean that #2662 resolves your problem?

jclab-joseph commented 7 months ago

So, #2662 solves the announce issue.

Note that, If #2662 is applied, this part should be fixed in #2660. This is what I will do. https://github.com/libp2p/go-libp2p/blob/master/p2p/net/swarm/swarm_dial.go#L492-L493 When connecting, link-local without ip6zone should be filtered out.

marten-seemann commented 7 months ago

Can you explain? If we’re filtering out the addresses with an ip6zone, then the peer won’t have that address, right?

Can you also share your setup? It looks like you’re using virtual network namespaces. What command do you use to initialize / configure them? I’d like to reproduce this on my machine.

jclab-joseph commented 7 months ago

Fixed #2660. Rolled back basic_host.go modification, fixed filtering in swarm_dial.

Here's how to create veth:

# ip link add veth0 type veth peer name veth1
# ip link set up dev veth0
# ip link set up dev veth1

# ifconfig veth0
veth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::44ae:32ff:fe46:898f  prefixlen 64  scopeid 0x20<link>
        ether 46:ae:32:46:89:8f  txqueuelen 1000  (Ethernet)

# ifconfig veth1
veth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::73:a5ff:fee3:75b9  prefixlen 64  scopeid 0x20<link>
        ether 02:73:a5:e3:75:b9  txqueuelen 1000  (Ethernet)

To create more than three nodes you need bridge:

# ip link add veth0 type veth peer name veth0p
# ip link add veth1 type veth peer name veth1p
# ip link add veth2 type veth peer name veth2p
# ip link set up dev veth0
# ip link set up dev veth1
# ip link set up dev veth2
# brctl addbr br0
# brctl addif br0 veth0p
# brctl addif br0 veth1p
# brctl addif br0 veth2p
marten-seemann commented 7 months ago

Ok, so things seem to be more complicated. If I understand correctly, in order to dial a link-local address, you MUST know the interface. That means it's not possible to dial /ip6/fe80::xxx/tcp/1234, it has to be /ip6zone/eth0/ip6/fe80::xxx/tcp/1234.

The interface (here eth0) is the local interface. It doesn't make any sense to advertise ip6zone addresses. The receiver has to construct this address when it discovers the address. Since we're dealing with link-local addresses, the most useful way to discover link-local addresses is mDNS, obviously. However, we also learn addresses from other protocols (e.g. Kademlia), but there's quite a lot of complexity necessary to add the interface prefix before addresses we receive there.

Therefore, #2662 is part of the fix here, namely it makes sure that we never advertise the zone. We should merge that PR, independent of the outcome of the discussion here.

The zeroconf PR is https://github.com/libp2p/zeroconf/pull/35, and it doesn't look wrong to me, but as libp2p maintainers, we don't see take responsibility for maintaining the zeroconf library, even if changes look reasonable on first sight. New features will only be merged once they're upstreamed to grandcat/zeroconf.

While I agree that in principle libp2p as a p2p networking library should correctly handle IPv6 link-local addresses, I'm not sure if this should be high on our roadmap given the benefit-complexity tradeoff, to be honest. This also applies to the review of PRs that add this feature. As far as I can tell, link-local addresses seem to be more of a fringe use case, as evident by this coming up only now. For people reading this in the future, please comment here if you think this assessment is incorrect.

The complexity lies in:

  1. Every transport needing to support dialing link-local addresses (at the very least, we'd have to update the dial matchers).
  2. Every protocol that feeds in new addresses (e.g. Kademlia) would need to be updated, such that it prepends the ip6zone prefix in front of link-local addresses.

Supporting this is pretty subtle, and I don't know how we'd test it end-to-end. Without end-to-end tests, it's very likely that this will either never properly work, or we'll unknowingly break support for it at some point in the future.