monogon-dev / monogon

The Monogon Monorepo. May contain traces of peanuts and a ✨pure Go Linux userland✨. Work in progress!
https://monogon.tech
Apache License 2.0
378 stars 9 forks source link

dhcp4c: support Classless Static Routes #89

Closed leoluk closed 2 years ago

leoluk commented 2 years ago

GCP's default L3 network setup assigns a /32 subnet and uses a static route for the gateway:

2021-12-04 02:42:36.903426 I | got new lease: &{10.156.0.18 2021-12-05 02:42:36.902724599 +0000 UTC m=+86411.030064302     Subnet Mask: ffffffff
    Router: 10.156.0.1
    Domain Name Server: 169.254.169.254
    Host Name: metropolis-1638585607.c.monogon-dev.internal
    Domain Name: c.monogon-dev.internal
    Interface MTU: [5 180]
    NTP Servers: 169.254.169.254
    IP Addresses Lease Time: 24h0m0s
    DHCP Message Type: ACK
    Server Identifier: 169.254.169.254
    DNS Domain Search List: [c.monogon-dev.internal google.internal]
    Classless Static Route: route to 10.156.0.1/32 via 0.0.0.0; route to 0.0.0.0/0 via 10.156.0.1
    TFTP Server Address: [10 156 0 1]
} (options:     Subnet Mask: ffffffff
    Router: 10.156.0.1
    Domain Name Server: 169.254.169.254
    Host Name: metropolis-1638585607.c.monogon-dev.internal
    Domain Name: c.monogon-dev.internal
    Interface MTU: [5 180]
    NTP Servers: 169.254.169.254
    IP Addresses Lease Time: 24h0m0s
    DHCP Message Type: ACK
    Server Identifier: 169.254.169.254
    DNS Domain Search List: [c.monogon-dev.internal google.internal]
    Classless Static Route: route to 10.156.0.1/32 via 0.0.0.0; route to 0.0.0.0/0 via 10.156.0.1
    TFTP Server Address: [10 156 0 1]
)
supervisor                       E1204 02:42:36.903905 supervisor_processor.go:225] Runnable root.network.interfaces.dhcp died: returned error when NODE_STATE_NEW: lease callback failed: failed to add default route via 10.156.0.1: network is unreachable
supervisor                       I1204 02:42:36.904079 supervisor_processor.go:431] rescheduling supervised node root.network.interfaces.dhcp with backoff 5.886853573s
supervisor                       E1204 02:42:37.107049 supervisor_processor.go:225] Runnable root.enrolment died: returned error when NODE_STATE_NEW: cannot restart cluster manager
supervisor                       I1204 02:42:37.108390 supervisor_processor.go:431] rescheduling supervised node root.enrolment with backoff 6.465232928s
(diff for logging) ```diff diff --git a/metropolis/node/core/network/dhcp4c/callback/callback.go b/metropolis/node/core/network/dhcp4c/callback/callback.go index 52276ab..a0cebd8 100644 --- a/metropolis/node/core/network/dhcp4c/callback/callback.go +++ b/metropolis/node/core/network/dhcp4c/callback/callback.go @@ -29,6 +29,7 @@ package callback import ( "fmt" + "log" "math" "net" "os" @@ -71,6 +72,8 @@ func isIPNetEqual(a, b *net.IPNet) bool { // clients on a single interface. func ManageIP(iface netlink.Link) dhcp4c.LeaseCallback { return func(old, new *dhcp4c.Lease) error { + log.Printf("got new lease: %v (options: %v)", new, new.Options.Summary(nil)) + newNet := new.IPNet() addrs, err := netlink.AddrList(iface, netlink.FAMILY_V4) ```

Hetzner and similar cloud providers are known for similar shenanigans. In the case of GCP, setting the barely documented --guest-os-features=MULTI_IP_SUBNET flag on the image seems to solve it:

2021-12-04 02:47:16.113171 I | got new lease: &{10.156.0.19 2021-12-05 02:47:16.112203034 +0000 UTC m=+86400.225447665     Subnet Mask: fffff000
    Router: 10.156.0.1
    Domain Name Server: 169.254.169.254
    Host Name: metropolis-1638585907.c.monogon-dev.internal
    Domain Name: c.monogon-dev.internal
    Interface MTU: [5 180]
    NTP Servers: 169.254.169.254
    IP Addresses Lease Time: 24h0m0s
    DHCP Message Type: ACK
    Server Identifier: 169.254.169.254
    DNS Domain Search List: [c.monogon-dev.internal google.internal]
    Classless Static Route: route to 10.156.0.1/32 via 0.0.0.0; route to 0.0.0.0/0 via 10.156.0.1
    TFTP Server Address: [10 156 0 1]
} (options:     Subnet Mask: fffff000
    Router: 10.156.0.1
    Domain Name Server: 169.254.169.254
    Host Name: metropolis-1638585907.c.monogon-dev.internal
    Domain Name: c.monogon-dev.internal
    Interface MTU: [5 180]
    NTP Servers: 169.254.169.254
    IP Addresses Lease Time: 24h0m0s
    DHCP Message Type: ACK
    Server Identifier: 169.254.169.254
    DNS Domain Search List: [c.monogon-dev.internal google.internal]
    Classless Static Route: route to 10.156.0.1/32 via 0.0.0.0; route to 0.0.0.0/0 via 10.156.0.1
    TFTP Server Address: [10 156 0 1]
)

However, it's unclear whether ignoring the static route is always safe to do even with MULTI_IP_SUBNET (no guarantee the gateway is in the same subnet?), and whether there are any unexpected side effects from the flag.

q3k commented 2 years ago

In progress: https://review.monogon.dev/c/monogon/+/482

lorenz commented 2 years ago

Submitted