Open nakermann1973 opened 2 years ago
Hi @nakermann1973!
The service.address_mode
you've set only impacts which address is being advertised to Consul during service registration. As you've noted, that's being set correctly in Consul, but not in Nomad for some reason.
Unfortunately I can't reproduce what you're seeing without the plugin you're using. I don't see ingress
on the list in https://www.cni.dev/plugins/current/, so this isn't one of the standard example plugins, right? Can you provide a link so that I can try to reproduce your setup?
The plugin I am using is macvlan
, and the config above defines the ingress
network. Config is pasted above in /opt/cni/config/ingress.conflist
🤦 D'oh, right. Ok, let me see if I can reproduce this and figure out what's happening there.
Ok, I've been able to reproduce in a Vagrant environment. I get slightly different host IP but I think that may be because I'm binding the client to 0.0.0.0 here. But either way it's not the addresses we'd expect to see.
Here's my CNI configuration, with plugins[0].master
set to the device with an IP address 10.0.2.15/24. (Note that if you deploy onto multiple clients you need to have non-overlapping ranges or you can have IP address collisions. But for purposes of this repro, we'll have one client.)
Running your the exact same jobspec you provided above, I see the addresses registered in Consul:
$ curl -s localhost:8500/v1/catalog/service/demo-cni | jq '.[].ServiceAddress'
"10.0.2.35"
"10.0.2.37"
"10.0.2.36"
But if I query Nomad, it's got the host addresses and not the advertised address:
$ nomad alloc status 7d4
...
Allocation Addresses (mode = "cni/ingress")
Label Dynamic Address
*inbound yes 10.0.2.15:27134
$ nomad alloc exec 7d4 env | grep inbound
NOMAD_ADDR_inbound=10.0.2.15:27134
NOMAD_ALLOC_PORT_inbound=27134
NOMAD_HOST_ADDR_inbound=10.0.2.15:27134
NOMAD_HOST_IP_inbound=10.0.2.15
NOMAD_HOST_PORT_inbound=27134
NOMAD_IP_inbound=10.0.2.15
NOMAD_PORT_inbound=27134
$ nomad alloc exec 7d4 ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: eth0@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
link/ether 82:eb:f7:84:6f:eb brd ff:ff:ff:ff:ff:ff
inet 10.0.2.37/24 brd 10.0.2.255 scope global eth0
valid_lft forever preferred_lft forever
But I don't think the problem is just a matter of advertising vs environment variables. As far as I can tell, I can't actually make requests to one of these endpoints. It seems like my route is correctly configured? Is there a CNI configuration step I'm missing here?
$ curl -v 10.0.2.37:27134
* Trying 10.0.2.37:27134...
* TCP_NODELAY set
* connect to 10.0.2.37 port 27134 failed: No route to host
* Failed to connect to 10.0.2.37 port 27134: No route to host
* Closing connection 0
curl: (7) Failed to connect to 10.0.2.37 port 27134: No route to host
$ ip route
default via 10.0.2.2 dev enp0s3 proto dhcp src 10.0.2.15 metric 100
10.0.2.0/24 dev enp0s3 proto kernel scope link src 10.0.2.15
10.0.2.2 dev enp0s3 proto dhcp scope link src 10.0.2.15 metric 100
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.56.0/24 dev enp0s10 proto kernel scope link src 192.168.56.2
192.168.56.0/24 dev enp0s8 proto kernel scope link src 192.168.56.2
192.168.56.0/24 dev enp0s9 proto kernel scope link src 192.168.56.2
To you last point (I can't actually make requests to one of these endpoints), This is probably due to the way macvlan interacts with the host interface.
A macvlan interface created on top of a host interface is not visible to the host. Packets are routed directly out to the external network. In order for a host to connect to the macvlan network in a container, the host also requires a macvlan interface on top of the physical host interface.
In my case, "bond0" is the host interface, on which macvlan interfaces are created by the CNI plugin. On the host, bond0 has no ip address. Rather I have a macvlan interface defined on the host which has the host's primary IP address. Packets from the host to the containers are switched via my upstream switch.
This is described at this blog (https://kcore.org/2020/08/18/macvlan-host-access/), and this docker forums post (https://forums.docker.com/t/macvlan-network-and-host-to-container-connectity/42950/4)
Thanks @nakermann1973 it'd been a hot second since I've had to play around with macvlan. That totally makes sense.
Ok, so that means we have a reproduction here. The tl;dr is that with CNI addresses we:
I'm going to retitle this issue for clarity and mark it for roadmapping.
I run in this confusion today.
My host ips from the workers are 10.21.21.42 - 44
dig +short whoami.service.consul
172.26.64.127
172.26.64.114
172.26.64.222
Call of whoami service over trafik ingress
Hostname: 0d422660cab6
IP: 127.0.0.1
IP: 172.26.64.222
RemoteAddr: 127.0.0.1:44226
GET / HTTP/1.1
Host: whoami.cloud.private
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
Accept-Encoding: gzip, deflate, br
Accept-Language: de-DE,de;q=0.9,tr-TR;q=0.8,tr;q=0.7,en-US;q=0.6,en;q=0.5
Cookie: _ga=GA1.2.426410110.1676546934
Sec-Ch-Ua: "Chromium";v="112", "Google Chrome";v="112", "Not:A-Brand";v="99"
Sec-Ch-Ua-Mobile: ?0
Sec-Ch-Ua-Platform: "Windows"
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: none
Sec-Fetch-User: ?1
Upgrade-Insecure-Requests: 1
X-B3-Parentspanid: 326ce80092a58513
X-B3-Sampled: 1
X-B3-Spanid: 10a2558c61fbc334
X-B3-Traceid: 2c29da7e9bb45377326ce80092a58513
X-Forwarded-For: 10.21.0.1
X-Forwarded-Host: whoami.cloud.private
X-Forwarded-Port: 443
X-Forwarded-Proto: https
X-Forwarded-Server: worker-01
X-Real-Ip: 10.21.0.1
But nomad exposes the ip and port 10.21.21.42:31364. The call of http://10.21.21.42:31364/ shows
Hostname: 0d422660cab6
IP: 127.0.0.1
IP: 172.26.64.222
RemoteAddr: 10.21.0.1:64948
GET / HTTP/1.1
Host: 10.21.21.42:31364
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
Accept-Encoding: gzip, deflate
Accept-Language: de-DE,de;q=0.9,tr-TR;q=0.8,tr;q=0.7,en-US;q=0.6,en;q=0.5
Connection: keep-alive
Cookie: redirect_to=%2Ffavicon.ico
Upgrade-Insecure-Requests: 1
If I change the host_network
to local for example ( point to lo ) Nomad API exposes the localhost
The nomad job file
job "whoami" {
datacenters = ["nomadder1"]
group "whoami" {
count = 3
# constraint {
# attribute = "${attr.unique.hostname}"
# set_contains = "worker-03"
# }
network {
mode = "bridge"
port "web" {
to = 8080
# host_network = "local"
}
port "health" {
to = -1
}
}
service {
name = "whoami"
port = "8080"
address_mode = "alloc"
connect {
sidecar_service {
proxy {
expose {
path {
path = "/health"
protocol = "http"
local_path_port = 8080
listener_port = "health"
}
}
}
}
}
tags = [
"traefik.enable=true",
"traefik.consulcatalog.connect=true",
"traefik.http.routers.whoami.tls=true",
"traefik.http.routers.whoami.rule=Host(`whoami.cloud.private`)",
]
check {
name = "whoami_health"
type = "http"
path = "/health"
port = "web"
interval = "10s"
timeout = "2s"
address_mode = "alloc"
}
}
task "whoami" {
driver = "docker"
config {
image = "traefik/whoami"
ports = ["web"]
args = ["--port", "${NOMAD_PORT_web}"]
}
resources {
cpu = 100
memory = 128
}
}
}
}
My env
Nomad v 1.5.5. Cni plugin installed like in standard installion but in v 1.2.0
@tgross I see a similar issue but with an IPv6 address.
{
"cniVersion": "0.4.0",
"name": "vpc",
"plugins": [
{
"type": "ptp-eth1",
"ipMasq": true,
"ipam": {
"type": "host-local",
"subnet": "172.26.48.0/20",
"dataDir": "/var/run/cni/vpc-ptp",
"routes": [
{
"dst": "0.0.0.0/0"
}
]
},
"dns": {
"nameservers": [
"10.249.47.19",
"2600:1c14:ca:5410:1bd:ff0e:bda2:50a7"
]
}
},
{
"type": "ipvlan",
"master": "eth1",
"mode": "l3s",
"ipam": {
"type": "host-local",
"resolvConf": "/opt/cni/run/vpc-resolv.conf",
"dataDir": "/var/run/cni/vpc-ipvlan",
"ranges": [
[
{
"subnet": "2600:1c14:ca:5415:88e3:0:0:0/80"
}
]
],
"routes": [
{
"dst": "::/0"
}
]
}
}
]
}
This is blocking CNI deployments due to variable interpolation limitations. Is it possible to at least add NOMAD_ALLOC_IP_
, NOMAD_ALLOC_PORT_
, NOMAD_ALLOC_ADDR_
as a workaround?
@tgross
Nomad version
Nomad v1.2.6 (95514d569610f15ce49b4a7a1a6bfd3e7b3e7b4f)
Operating system and Environment details
Gentoo server, running a test cluster in docker containers (consul and nomad server are in docker, nomad client is running on the host)
Issue
I am experiencing a very similar issue to #11216. I have a CNI macvlan network defined, and a nomad job using this network.
The IP-related environment variables in the container get set to the host address (10.17.17.1), and not the container address (10.17.17.X)
The alloc's Allocation Address is also incorrect:
The ServiceAddresses are populated correctly in consul, though:
Reproduction steps
Create a CNI network as follows:
Start a job as follows:
Expected Result
Environment variables and Allocation Addresses are set to the container address
Actual Result
Environment variables and Allocation Addresses are set to the host address
Job file (if appropriate)
See above