CNI-provided address is bound/advertised correctly, wrong in Nomad API and env vars

nakermann1973 commented 2 years ago

Nomad version

Nomad v1.2.6 (95514d569610f15ce49b4a7a1a6bfd3e7b3e7b4f)

Operating system and Environment details

Gentoo server, running a test cluster in docker containers (consul and nomad server are in docker, nomad client is running on the host)

Issue

I am experiencing a very similar issue to #11216. I have a CNI macvlan network defined, and a nomad job using this network.

The IP-related environment variables in the container get set to the host address (10.17.17.1), and not the container address (10.17.17.X)

  "NOMAD_ADDR_inbound":"10.17.17.1:22638"
  "NOMAD_HOST_ADDR_inbound":"10.17.17.1:22638"
  "NOMAD_HOST_IP_inbound":"10.17.17.1"
  "NOMAD_IP_inbound":"10.17.17.1"

The alloc's Allocation Address is also incorrect:

$ nomad alloc status 30e443d0
...
Allocation Addresses (mode = "cni/ingress")
Label     Dynamic  Address
*inbound  yes      10.17.17.1:22638
...

The ServiceAddresses are populated correctly in consul, though:

$ curl -s 10.0.150.200:8500/v1/catalog/service/demo-cni | jq   | grep ServiceAddress
    "ServiceAddress": "10.17.17.36",
    "ServiceAddress": "10.17.17.35",
    "ServiceAddress": "10.17.17.34",

Reproduction steps

Create a CNI network as follows:

$ cat /opt/cni/config/ingress.conflist
{
  "cniVersion": "0.4.0",
  "name": "ingress",
  "plugins": [
    {
      "type": "macvlan",
      "master": "bond0",
      "ipam": {
        "type": "host-local",
        "ranges": [
          [
            {
              "subnet": "10.17.17.0/24",
              "rangeStart": "10.17.17.32",
              "rangeEnd": "10.17.17.40",
              "gateway": "10.17.17.254"
            }
          ]
        ]
      }
    }
  ]
}

Start a job as follows:

$ cat demo-cni.hcl
job "demo-cni" {
  datacenters = ["dc1"]

  group "demo_cni" {
    count = 3

    network {
      mode = "cni/ingress"
      port "inbound" { }
    }

    service {
      name = "demo-cni"
      port = "inbound"
      address_mode = "alloc"

      tags = [
        "traefik.enable=true",
        "traefik.http.routers.test_cni.rule=Path(`/testcni`)",
      ]

    }

    task "server" {
      env {
        PORT    = "${NOMAD_PORT_inbound}"
      }

      driver = "docker"

      config {
        image = "ealen/echo-server"
        ports = ["inbound"]
      }
    }
  }
}

Expected Result

Environment variables and Allocation Addresses are set to the container address

Actual Result

Environment variables and Allocation Addresses are set to the host address

Job file (if appropriate)

See above

tgross commented 2 years ago

Hi @nakermann1973!

The service.address_mode you've set only impacts which address is being advertised to Consul during service registration. As you've noted, that's being set correctly in Consul, but not in Nomad for some reason.

Unfortunately I can't reproduce what you're seeing without the plugin you're using. I don't see ingress on the list in https://www.cni.dev/plugins/current/, so this isn't one of the standard example plugins, right? Can you provide a link so that I can try to reproduce your setup?

nakermann1973 commented 2 years ago

The plugin I am using is macvlan, and the config above defines the ingress network. Config is pasted above in /opt/cni/config/ingress.conflist

tgross commented 2 years ago

🤦 D'oh, right. Ok, let me see if I can reproduce this and figure out what's happening there.

tgross commented 2 years ago

Ok, I've been able to reproduce in a Vagrant environment. I get slightly different host IP but I think that may be because I'm binding the client to 0.0.0.0 here. But either way it's not the addresses we'd expect to see.

Here's my CNI configuration, with plugins[0].master set to the device with an IP address 10.0.2.15/24. (Note that if you deploy onto multiple clients you need to have non-overlapping ranges or you can have IP address collisions. But for purposes of this repro, we'll have one client.)

CNI config

```json { "cniVersion": "0.4.0", "name": "ingress", "plugins": [ { "type": "macvlan", "master": "enp0s3", "ipam": { "type": "host-local", "ranges": [ [ { "subnet": "10.0.2.0/24", "rangeStart": "10.0.2.32", "rangeEnd": "10.0.2.40", "gateway": "10.0.2.254" } ] ] } } ] } ```

Running your the exact same jobspec you provided above, I see the addresses registered in Consul:

$ curl -s localhost:8500/v1/catalog/service/demo-cni | jq '.[].ServiceAddress'
"10.0.2.35"
"10.0.2.37"
"10.0.2.36"

But if I query Nomad, it's got the host addresses and not the advertised address:

$ nomad alloc status 7d4
...
Allocation Addresses (mode = "cni/ingress")
Label     Dynamic  Address
*inbound  yes      10.0.2.15:27134

$ nomad alloc exec 7d4 env | grep inbound
NOMAD_ADDR_inbound=10.0.2.15:27134
NOMAD_ALLOC_PORT_inbound=27134
NOMAD_HOST_ADDR_inbound=10.0.2.15:27134
NOMAD_HOST_IP_inbound=10.0.2.15
NOMAD_HOST_PORT_inbound=27134
NOMAD_IP_inbound=10.0.2.15
NOMAD_PORT_inbound=27134

$ nomad alloc exec 7d4 ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: eth0@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    link/ether 82:eb:f7:84:6f:eb brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.37/24 brd 10.0.2.255 scope global eth0
       valid_lft forever preferred_lft forever

But I don't think the problem is just a matter of advertising vs environment variables. As far as I can tell, I can't actually make requests to one of these endpoints. It seems like my route is correctly configured? Is there a CNI configuration step I'm missing here?

$ curl -v 10.0.2.37:27134
*   Trying 10.0.2.37:27134...
* TCP_NODELAY set
* connect to 10.0.2.37 port 27134 failed: No route to host
* Failed to connect to 10.0.2.37 port 27134: No route to host
* Closing connection 0
curl: (7) Failed to connect to 10.0.2.37 port 27134: No route to host

$ ip route
default via 10.0.2.2 dev enp0s3 proto dhcp src 10.0.2.15 metric 100
10.0.2.0/24 dev enp0s3 proto kernel scope link src 10.0.2.15
10.0.2.2 dev enp0s3 proto dhcp scope link src 10.0.2.15 metric 100
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.56.0/24 dev enp0s10 proto kernel scope link src 192.168.56.2
192.168.56.0/24 dev enp0s8 proto kernel scope link src 192.168.56.2
192.168.56.0/24 dev enp0s9 proto kernel scope link src 192.168.56.2

nakermann1973 commented 2 years ago

To you last point (I can't actually make requests to one of these endpoints), This is probably due to the way macvlan interacts with the host interface.

A macvlan interface created on top of a host interface is not visible to the host. Packets are routed directly out to the external network. In order for a host to connect to the macvlan network in a container, the host also requires a macvlan interface on top of the physical host interface.

In my case, "bond0" is the host interface, on which macvlan interfaces are created by the CNI plugin. On the host, bond0 has no ip address. Rather I have a macvlan interface defined on the host which has the host's primary IP address. Packets from the host to the containers are switched via my upstream switch.

This is described at this blog (https://kcore.org/2020/08/18/macvlan-host-access/), and this docker forums post (https://forums.docker.com/t/macvlan-network-and-host-to-container-connectity/42950/4)

tgross commented 2 years ago

Thanks @nakermann1973 it'd been a hot second since I've had to play around with macvlan. That totally makes sense.

Ok, so that means we have a reproduction here. The tl;dr is that with CNI addresses we:

advertise correctly to Consul
bind to the correct IP address for the container
but display the wrong IP in Nomad's API / CLI output and in the environment variables we expose to the host.

I'm going to retitle this issue for clarity and mark it for roadmapping.

suikast42 commented 1 year ago

I run in this confusion today.

My host ips from the workers are 10.21.21.42 - 44

dig +short whoami.service.consul
172.26.64.127
172.26.64.114
172.26.64.222

Call of whoami service over trafik ingress

Hostname: 0d422660cab6
IP: 127.0.0.1
IP: 172.26.64.222
RemoteAddr: 127.0.0.1:44226
GET / HTTP/1.1
Host: whoami.cloud.private
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
Accept-Encoding: gzip, deflate, br
Accept-Language: de-DE,de;q=0.9,tr-TR;q=0.8,tr;q=0.7,en-US;q=0.6,en;q=0.5
Cookie: _ga=GA1.2.426410110.1676546934
Sec-Ch-Ua: "Chromium";v="112", "Google Chrome";v="112", "Not:A-Brand";v="99"
Sec-Ch-Ua-Mobile: ?0
Sec-Ch-Ua-Platform: "Windows"
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: none
Sec-Fetch-User: ?1
Upgrade-Insecure-Requests: 1
X-B3-Parentspanid: 326ce80092a58513
X-B3-Sampled: 1
X-B3-Spanid: 10a2558c61fbc334
X-B3-Traceid: 2c29da7e9bb45377326ce80092a58513
X-Forwarded-For: 10.21.0.1
X-Forwarded-Host: whoami.cloud.private
X-Forwarded-Port: 443
X-Forwarded-Proto: https
X-Forwarded-Server: worker-01
X-Real-Ip: 10.21.0.1

But nomad exposes the ip and port 10.21.21.42:31364. The call of http://10.21.21.42:31364/ shows

Hostname: 0d422660cab6
IP: 127.0.0.1
IP: 172.26.64.222
RemoteAddr: 10.21.0.1:64948
GET / HTTP/1.1
Host: 10.21.21.42:31364
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
Accept-Encoding: gzip, deflate
Accept-Language: de-DE,de;q=0.9,tr-TR;q=0.8,tr;q=0.7,en-US;q=0.6,en;q=0.5
Connection: keep-alive
Cookie: redirect_to=%2Ffavicon.ico
Upgrade-Insecure-Requests: 1

If I change the host_network to local for example ( point to lo ) Nomad API exposes the localhost

The nomad job file

job "whoami" {
  datacenters = ["nomadder1"]

  group "whoami" {
    count = 3
    #    constraint {
    #      attribute    = "${attr.unique.hostname}"
    #      set_contains = "worker-03"
    #    }
    network {
      mode = "bridge"
      port "web" {
        to = 8080
#        host_network = "local"
      }
      port "health" {
        to = -1
      }
    }

    service {
      name = "whoami"
      port = "8080"
      address_mode = "alloc"
      connect {
        sidecar_service {
          proxy {
            expose {
              path {
                path            = "/health"
                protocol        = "http"
                local_path_port = 8080
                listener_port   = "health"
              }
            }
          }
        }
      }
      tags = [
        "traefik.enable=true",
        "traefik.consulcatalog.connect=true",
        "traefik.http.routers.whoami.tls=true",
        "traefik.http.routers.whoami.rule=Host(`whoami.cloud.private`)",
      ]

      check {
        name     = "whoami_health"
        type     = "http"
        path     = "/health"
        port     = "web"
        interval = "10s"
        timeout  = "2s"
        address_mode = "alloc"
      }
    }

    task "whoami" {
      driver = "docker"
      config {
        image = "traefik/whoami"
        ports = ["web"]
        args  = ["--port", "${NOMAD_PORT_web}"]
      }

      resources {
        cpu    = 100
        memory = 128
      }

    }
  }
}

My env

Nomad v 1.5.5. Cni plugin installed like in standard installion but in v 1.2.0

sinisterstumble commented 1 year ago

@tgross I see a similar issue but with an IPv6 address.

{
  "cniVersion": "0.4.0",
  "name": "vpc",
  "plugins": [
    {
      "type": "ptp-eth1",
      "ipMasq": true,
      "ipam": {
        "type": "host-local",
        "subnet": "172.26.48.0/20",
        "dataDir": "/var/run/cni/vpc-ptp",
        "routes": [
          {
            "dst": "0.0.0.0/0"
          }
        ]
      },
      "dns": {
        "nameservers": [
          "10.249.47.19",
          "2600:1c14:ca:5410:1bd:ff0e:bda2:50a7"
        ]
      }
    },
    {
      "type": "ipvlan",
      "master": "eth1",
      "mode": "l3s",
      "ipam": {
        "type": "host-local",
        "resolvConf": "/opt/cni/run/vpc-resolv.conf",
        "dataDir": "/var/run/cni/vpc-ipvlan",
        "ranges": [
          [
            {
              "subnet": "2600:1c14:ca:5415:88e3:0:0:0/80"
            }
          ]
        ],
        "routes": [
          {
            "dst": "::/0"
          }
        ]
      }
    }
  ]
}

sinisterstumble commented 1 year ago

This is blocking CNI deployments due to variable interpolation limitations. Is it possible to at least add NOMAD_ALLOC_IP_ , NOMAD_ALLOC_PORT_, NOMAD_ALLOC_ADDR_ as a workaround?

@tgross

hashicorp / nomad