Open Jess3Jane opened 9 months ago
Ah, I failed to mention that the reproduction was done with the default configuration that ships with Nomad so I don't think it's something weird in there breaking things.
I have this issue, it seems to be caused by the Docker/Nomad service being offline less than the heartbeat_grace
, so Nomad doesn't consider the allocations lost and resumes them, but because Docker was offline the network namespaces are gone.
I worked around it by adding a sleep to the nomad service file which is longer than heartbeat_grace
, so allocations are always considered lost and Nomad recreates them, including the network namespaces.
The nomad cluster I use utilises fast booting lightweight VMs (less than 10s) thus nearly always hits this.
...
[Service]
EnvironmentFile=-/etc/nomad.d/nomad.env
ExecStartPre=/bin/sleep 90
ExecReload=/bin/kill -HUP $MAINPID
ExecStart=/usr/bin/nomad agent -config /etc/nomad.d
...
Maybe https://github.com/hashicorp/nomad/pull/19886 would help when merged.
Crosslinking #15086 for visibility.
Hi @Jess3Jane and thanks for raising this issue with a great reproduction. I was able to reproduce this locally and have included details below for future readers. I'll add this to our backlog.
Host networking, Docker processes, and health check endpoint after initial start.
root@uk1-c1:/home/jrasell# ip addr show veth541d761a
17: veth541d761a@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master nomad state UP group default
link/ether ea:07:d7:03:b6:b2 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::e807:d7ff:fe03:b6b2/64 scope link
valid_lft forever preferred_lft forever
root@uk1-c1:/home/jrasell# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f8edd356ec13 redis:7 "docker-entrypoint.s…" 4 minutes ago Up 4 minutes redis-1f994fe3-06b6-dbc9-2897-72b429a61820
32d148ee127a gcr.io/google_containers/pause-arm64:3.1 "/pause" 4 minutes ago Up 4 minutes nomad_init_1f994fe3-06b6-dbc9-2897-72b429a61820
root@uk1-c1:/home/jrasell# (printf "PING\r\n";) | nc 192.168.1.121 27080
+PONG
Task events show restart of the Docker processes:
Recent Events:
Time Type Description
2024-02-20T08:36:22Z Started Task started by client
2024-02-20T08:36:04Z Restarting Task restarting in 17.156781522s
2024-02-20T08:36:04Z Terminated Exit Code: 0
2024-02-20T08:31:14Z Started Task started by client
2024-02-20T08:31:14Z Task Setup Building Task Directory
2024-02-20T08:31:14Z Received Task received by client
The health check no longer responds.
root@uk1-c1:/home/jrasell# (printf "PING\r\n";) | nc 192.168.1.121 27080
root@uk1-c1:/home/jrasell#
The Nomad client host machine (I only had this test job running on my cluster) no longer has a virtual interface configured:
ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp0s1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 52:54:00:5b:f4:27 brd ff:ff:ff:ff:ff:ff
inet 192.168.121.22/24 metric 100 brd 192.168.121.255 scope global dynamic enp0s1
valid_lft 55052sec preferred_lft 55052sec
inet6 fd6b:32d9:3793:3897:5054:ff:fe5b:f427/64 scope global dynamic mngtmpaddr noprefixroute
valid_lft 2591912sec preferred_lft 604712sec
inet6 fe80::5054:ff:fe5b:f427/64 scope link
valid_lft forever preferred_lft forever
3: enp0s2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 52:54:00:1f:6b:0c brd ff:ff:ff:ff:ff:ff
inet 192.168.1.121/24 brd 192.168.1.255 scope global enp0s2
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fe1f:6b0c/64 scope link
valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:6c:60:7c:18 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
inet6 fe80::42:6cff:fe60:7c18/64 scope link
valid_lft forever preferred_lft forever
11: nomad: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
link/ether e2:48:45:4d:96:6e brd ff:ff:ff:ff:ff:ff
inet 172.26.64.1/20 brd 172.26.79.255 scope global nomad
valid_lft forever preferred_lft forever
inet6 fe80::e048:45ff:fe4d:966e/64 scope link
valid_lft forever preferred_lft forever
Not sure whether this is realy related but I have similar issue together with CNI where port forwarding didn't work after all services were restarted (note: I masked the first two ip-address digits on the destination):
| plugin type="portmap" failed (add): unable to setup DNAT: running [/sbin/iptables -t nat -A CNI-DN-231ebe256ae7b6bd9006d -p tcp --dport 8084 -d 127.0.0.1 -j DNAT --to-destination x.y.70.228:8080 --wait]: exit status 4: iptables: Resource temporarily unavailable.
| pre-run hook "network" failed: failed to configure networking for alloc: failed to configure network: plugin type="portmap" failed (add): unable to setup DNAT: running [/sbin/iptables -t nat -A CNI-DN-231ebe256ae7b6bd9006d -p tcp --dport 8084 -d 127.0.0.1 -j DNAT --to-destination x.y.70.228:8080 --wait]: exit status 4: iptables: Resource temporarily unavailable.
| failed to setup alloc: pre-run hook "network" failed: failed to configure networking for alloc: failed to configure network: plugin type="portmap" failed (add): unable to setup DNAT: running [/sbin/iptables -t nat -A CNI-DN-231ebe256ae7b6bd9006d -p tcp --dport 8084 -d 127.0.0.1 -j DNAT --to-destination x.y.70.228:8080 --wait]: exit status 4: iptables: Resource temporarily unavailable.
Seems like a race condition to me. In this case I would expect the job to fail and may be retry later.
Apologies for closing this, I think github did something silly with automation
I don't need to restart Docker for this to occur. I'm not sure WHAT is proccing the change but under bridge networking my allocations are started with just a loopback interface.
Nomad version
Though we are hitting it in v1.7.2 as well
Operating system and Environment details
We have hit this on multiple machines with slightly different versions, though all are Ubuntu 22.04. These are the details of a completely fresh Digital Ocean instance I used to reproduce the bug.
Issue
We have noticed that when we restart the Docker daemon on our machines every Nomad job on the client is brought back up with a busted network. To be more specific, it is brought up with no network. For example, my test container before restarting docker has the following networks:
and after restarting the daemon, is brought back up with just loopback:
This happens with every container, including the Nomad init container. Docker restarts the containers (as expected), the veths get recreated (as expected), but the containers now lack any interfaces other than loopback (unexpected).
Things that might be notable, the
nomad
network changes from<BROADCAST,MULTICAST,UP,LOWER_UP>
to<NO-CARRIER,BROADCAST,MULTICAST,UP>
and on machines withsystemd-networkd
, it's logs complain about the veth's loosing carrier.Reproduction steps
docker-ce
as per their docs (I used Docker's apt registry to install it).https://github.com/containernetworking/plugins/releases/download/v1.0.0/cni-plugins-linux-amd64-v1.0.0.tgz
into/opt/cni/bin
systemctl start docker
systemctl start nomad
systemctl restart docker
Expected Result
The ip/port combo that the job binds should be
curl
-able. It is before docker is restarted.Actual Result
If you curl the ip/port combo it will complain about having no route to host:
This makes sense as executing
ip addr
from within the container will now reveal the container has lost it's bridge network veth.Job file (if appropriate)
We've noticed this happen with every job but the job file I used for the reproduction is:
The toy instance I used for reproduction has a broken journal so sadly I have no logs from that to provide. If reproduction turns out to be an issue I'd be happy to send over some logs from one of our actual failing instances but I have a hunch this won't be that hard to reproduce.