Open mr-karan opened 2 years ago
Hi @mr-karan! The bridge networking setup on the Nomad client appends to iptables (ref networking_bridge_linux.go#L108-L115
). This is deliberate and introduced in da27dafdf0f9dce668a03f28987c5806ffb9eda4 so that cluster administrators can add their own rules to the chain. But if you install Docker after you've run a Nomad client that needs the bridge, then the order of those rules is going to be unexpected.
That being said, I tried to reproduce this and weirdly I can't even hit localhost inside the application's network namespace! The allocation is reachable from outside the namespace just fine, just not inside.
Run the job and check the allocation address:
$ nomad alloc status 9cfbb1a8
...
Allocation Addresses (mode = "bridge")
Label Dynamic Address
*www yes 127.0.0.1:27053 -> 8001
...
$ curl 127.0.0.1:27053
hello from 127.0.0.1:27053
The CNI logs on the client look as expected:
2022-06-08T18:22:04.376Z [DEBUG] client.alloc_runner.runner_hook: received result from CNI: alloc_id=9cfbb1a8-bd4b-761b-e43e-41ff5ad0e48a result="{\"Interfaces\":{\"eth0\":{\"IPConfigs\":[{\"IP\":\"172.26.64.7\",\"Gateway\":\"172.26.64.1\"}],\"Mac\":\"5e:cf:1d:d1:0b:47\",\"Sandbox\":\"/var/run/netns/9cfbb1a8-bd4b-761b-e43e-41ff5ad0e48a\"},\"nomad\":{\"IPConfigs\":null,\"Mac\":\"12:d8:21:27:7d:af\",\"Sandbox\":\"\"},\"veth8a0eab86\":{\"IPConfigs\":null,\"Mac\":\"b6:37:96:04:32:a6\",\"Sandbox\":\"\"}},\"DNS\":[{}],\"Routes\":[{\"dst\":\"0.0.0.0/0\"}]}"
We can compare to a docker task:
2022-06-08T18:27:46.790Z [DEBUG] client.alloc_runner.runner_hook: received result from CNI: alloc_id=82b89d12-7ecd-2c42-d2a0-31bdff8c46ea result="{\"Interfaces\":{\"eth0\":{\"IPConfigs\":[{\"IP\":\"172.26.64.9\",\"Gateway\":\"172.26.64.1\"}],\"Mac\":\"fa:86:74:7a:7b:64\",\"Sandbox\":\"/var/run/docker/netns/8ce2419da8fa\"},\"nomad\":{\"IPConfigs\":null,\"Mac\":\"12:d8:21:27:7d:af\",\"Sandbox\":\"\"},\"vethe2a5e1b4\":{\"IPConfigs\":null,\"Mac\":\"da:88:15:d9:ac:7b\",\"Sandbox\":\"\"}},\"DNS\":[{}],\"Routes\":[{\"dst\":\"0.0.0.0/0\"}]}"
Now let's look inside the network namespace of this task:
$ sudo nsenter -t $(pgrep busybox) --net ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0@if13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 5e:cf:1d:d1:0b:47 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.26.64.7/20 brd 172.26.79.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::5ccf:1dff:fed1:b47/64 scope link
valid_lft forever preferred_lft forever
$ sudo nsenter -t $(pgrep busybox) --net netstat -antp
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:8001 0.0.0.0:* LISTEN 8910/busybox
$ sudo nsenter -t $(pgrep busybox) --net ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0@if13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 5e:cf:1d:d1:0b:47 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.26.64.7/20 brd 172.26.79.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::5ccf:1dff:fed1:b47/64 scope link
valid_lft forever preferred_lft forever
Ok that all looks good. Let's make sure we have that eth0@if13 veth interface on the host:
$ ip addr
...
13: veth72eb50f3@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master nomad state UP group default
link/ether b6:37:96:04:32:a6 brd ff:ff:ff:ff:ff:ff link-netns 912f1503-3819-2412-fc6b-8abc42faca79
inet6 fe80::b437:96ff:fe04:32a6/64 scope link
valid_lft forever preferred_lft forever
So far so good, let's curl various address/port combinations from inside the allocation's network namespace:
# container IP
$ sudo nsenter -t $(pgrep busybox) --net curl -m1 172.26.64.7:27053
curl: (28) Connection timed out after 1002 milliseconds
$ sudo nsenter -t $(pgrep busybox) --net curl -m1 172.26.64.7:8001
curl: (28) Connection timed out after 1000 milliseconds
# nomad bridge IP
$ sudo nsenter -t $(pgrep busybox) --net curl -m1 172.26.64.1:8001
curl: (7) Failed to connect to 172.26.64.1 port 8001: Connection refused
$ sudo nsenter -t $(pgrep busybox) --net curl -m1 172.26.64.1:27053
curl: (28) Connection timed out after 1001 milliseconds
# localhost
$ sudo nsenter -t $(pgrep busybox) --net curl -m1 127.0.0.1:27053
curl: (28) Connection timed out after 1001 milliseconds
$ sudo nsenter -t $(pgrep busybox) --net curl -m1 127.0.0.1:8001
curl: (28) Connection timed out after 1001 milliseconds
# what about the host IP?
$ sudo nsenter -t $(pgrep busybox) --net curl -m1 192.168.56.69:27053
curl: (28) Connection timed out after 1001 milliseconds
$ sudo nsenter -t $(pgrep busybox) --net curl -m1 ^C
$ sudo nsenter -t $(pgrep busybox) --net curl -m1 192.168.56.69:8001
curl: (7) Failed to connect to 192.168.56.69 port 8001: Connection refused
# docker0 bridge IP (just in case)
$ sudo nsenter -t $(pgrep busybox) --net curl -m1 172.17.0.1:27053
curl: (28) Connection timed out after 1001 milliseconds
$ sudo nsenter -t $(pgrep busybox) --net curl -m1 172.17.0.1:8001
curl: (7) Failed to connect to 172.17.0.1 port 8001: Connection refused
None of these work!
I'm going to mark this as a bug for further investigation. In the meantime, if you want to test that the application is reachable you should probably be using the host address from outside the application container anyways, as it'll give you a more accurate picture about how networking is set up.
In the meantime, if you want to test that the application is reachable you should probably be using the host address from outside the application container anyways, as it'll give you a more accurate picture about how networking is set up.
As a stop gap solution this is okay. Although I’d like to point that it’s not really practical if the node has multiple namespaces which are managed by different people, giving everyone the SSH access (so they login and do these curl commands) to underlying nodes isn’t feasible.
Although I’d like to point that it’s not really practical if the node has multiple namespaces which are managed by different people, giving everyone the SSH access (so they login and do these curl commands) to underlying nodes isn’t feasible.
Yeah if you don't intend for the application to be visible outside the host that's definitely a constraint.
Nomad version
Output from
nomad version
Operating system and Environment details
Issue
1) On a fresh Nomad client VM, I deploy an exec job which is similar to:
2) The job gets deployed and I can see the
Host Address
inside Allocation:3) I
exec
inside the alloc, and try to reach this address (192.168.29.76:31958
):4) I install docker on this host.
Now, since docker mangles
iptables
on the host, here's a snapshot of all the rules existing on this host:After I install docker, the above
curl
command stops working:The
iptables
rules list afterdocker
is installed:Reproduction steps
Detailed steps are above already.
Here's a TL;DR:
More context
I wonder if
docker
is putting some kind of iptables rule on thehost
network interface which makes it unreachable from thenomad
network interface ? Which is why as soon as docker is installed on the host, the address is unreachable.IP routes on the host:
IP routes on the alloc:
(This I believe is the default subnet that nomad uses).
Question:
What I want to achieve is to be able to reach the application from inside the
alloc exec
for quick debugging/tests. What is the best way to achieve that/which address/interface should I be using in that case? I've tried thelo
/0.0.0.0
/nomad
but none seem to work. This is unlike docker driver where the application binds to 127.0.0.1 in the container itself so it's reachable, so how exactly would this work inexec
?Thanks!