hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.89k stars 1.95k forks source link

client `network_interface` fingerprinting incorrect/unintended IP address in `unique.network.ip-address` #20562

Open Gurpartap opened 5 months ago

Gurpartap commented 5 months ago

Nomad version

Nomad v1.7.6
BuildDate 2024-03-12T07:27:36Z
Revision 594fedbfbc4f0e532b65e8a69b28ff9403eb822e

Operating system and Environment details

Debian 11 on Linode (with Linode's Network Helper enabled)

Issue

TLDR; Nomad client network_interface should allow for selection of a different addr resource within the same interface

Related code:

resp.AddAttribute("unique.network.ip-address", nwResources[0].IP)

Apparently the behaviour has been recognized as, "Deprecated, setting the first IP as unique IP for the node", but is yet to be worked upon:

https://github.com/hashicorp/nomad/blob/83720740f5a7f4053af2ba45dc687964de2a93cb/client/fingerprint/network.go#L111-L120

Reproduction steps

Linode's automatic Network Helper tool sets up something like this:

$ cat /etc/network/interfaces
# Generated by Linode Network Helper
# Sun May 12 14:16:06 2024 UTC
#
# This file is automatically generated on each boot with your Linode's
# current network configuration. If you need to modify this file, please
# first disable the 'Auto-configure networking' setting within your Linode's
# configuration profile:
#  - https://cloud.linode.com/linodes/35915162/configurations
#
# For more information on Network Helper:
#  - https://www.linode.com/docs/guides/network-helper/
#
# A backup of the previous config is at /etc/network/.interfaces.linode-last
# A backup of the original config is at /etc/network/.interfaces.linode-orig
#
# /etc/network/interfaces

auto lo
iface lo inet loopback

source /etc/network/interfaces.d/*

auto eth0

allow-hotplug eth0

iface eth0 inet6 auto
iface eth0 inet static
    address 50.60.70.101/24
    gateway 50.60.70.1
    up   ip addr add 192.168.120.225/17 dev eth0 label eth0:1
    down ip addr del 192.168.120.225/17 dev eth0 label eth0:1
$ ip a
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether f2:3c:93:a1:9d:dd brd ff:ff:ff:ff:ff:ff
    inet 50.60.70.101/24 brd 50.60.70.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.120.225/17 scope global eth0:1
       valid_lft forever preferred_lft forever
    inet6 2600:abcd:wxyz/64 scope global dynamic mngtmpaddr
       valid_lft 5144sec preferred_lft 1544sec
    inet6 fe80::abcd:wxyz/64 scope link
       valid_lft forever preferred_lft forever

Current behaviour

client {
  enabled = true
  network_interface = "eth0"
}
unique.network.ip-address: 50.60.70.101 # $my_public_ip, first from eth0. not useful

Proposed behaviour

client {
  enabled = true

  # allow label based interface addr resource selection
  network_interface = "eth0:1"

  # or, make it select the interface resource which matches CIDR
  network_cidr = "192.168.120.225/17"

  # with templating support
  network_cidr = "{{ GetPrivateInterfaces | … | limit 1 | attr \"address\" }}"
}
unique.network.ip-address: 192.168.120.225 # $my_private_ip from eth0:1

Workaround

The only workaround to this that I've been able to come up with is setting up a dummy interface on the system. And then setting:

client {
  enabled = true
  network_interface = "dummy10" # selects $my_private_ip
}
14: dummy10: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 5f:1d:12:6d:a7:16 brd ff:ff:ff:ff:ff:ff
    inet 192.168.120.225/17 brd 192.168.255.255 scope global dummy10
       valid_lft forever preferred_lft forever
unique.network.ip-address: 192.168.120.225 # $my_private_ip from dummy10

It works but comes with its own oddities. See https://github.com/hashicorp/nomad/issues/3675#issuecomment-504660287.

Other considerations

Gurpartap commented 5 months ago

https://github.com/hashicorp/nomad/blob/83720740f5a7f4053af2ba45dc687964de2a93cb/client/fingerprint/network.go#L111-L120

Similar to host_network { cidr = "…", interface = "…" }, replacing the above code with something like the following could work.

client {
  enabled = true
  network_interface = "eth0"
  network_cidr = "192.168.120.225/17"
}
func (f *NetworkFingerprint) Fingerprint(req *FingerprintRequest, resp *FingerprintResponse) error {
    //…

    uniqueIP := ""

    for _, nwResource := range nwResources {
        logger.Debug("detected interface IP", "IP", nwResource.IP)

        if uniqueIP == "" && nwResource.CIDR == cfg.NetworkCIDR {
            uniqueIP = nwResource.IP
        }
    }

    if uniqueIP == "" && len(nwResources) > 0 { 
        // setting the first IP as unique IP for the node
        uniqueIP = nwResources[0].IP
    } 

    resp.AddAttribute("unique.network.ip-address", uniqueIP)

    //…
}

This allows for more explicit unique.network.ip-address setting, rather than unexpected behaviour across a restart where a system may have shuffled the order of addrs within an interface. Related to: https://github.com/hashicorp/nomad/issues/10179

tgross commented 4 months ago

Hi @Gurpartap! Thanks for the thorough digging into this. I think you're on the right track here in terms of the fix too, but it seems likely that we'll want to also allow explicit IP address setting (with a go-sockaddr template) as well. I'll mark this for roadmapping, but if you're interesting in opening a PR we'd be happy to review it as well!