hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
15k stars 1.96k forks source link

Host address unreachable from exec driver if docker is present on host #13296

Open mr-karan opened 2 years ago

mr-karan commented 2 years ago

Nomad version

Output from nomad version

Nomad v1.3.1 (2b054e38e91af964d1235faa98c286ca3f527e56)

Operating system and Environment details

No LSB modules are available.
Distributor ID: Pop
Description:    Pop!_OS 22.04 LTS
Release:    22.04
Codename:   jammy

Issue

1) On a fresh Nomad client VM, I deploy an exec job which is similar to:

job "http" {
  datacenters = ["dc1"]
  type        = "service"

  group "app" {
    count = 1
    network {
      mode = "bridge"
      port "python-http" {
        to = "8888"
      }
    }

    task "server" {
      driver = "exec"

      config {
        command = "/usr/bin/python3"
        args    = ["-m", "http.server", "8888"]
      }
    }
  }
}

2) The job gets deployed and I can see the Host Address inside Allocation:

image

3) I exec inside the alloc, and try to reach this address (192.168.29.76:31958):

$ nomad alloc exec -i -t -task server ff724b46 /bin/bash
nobody@pop-os:/$ curl  192.168.29.76:31958 
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Directory listing for /</title>
</head>
<body>
<h1>Directory listing for /</h1>
<hr>
<ul>
<li><a href="alloc/">alloc/</a></li>
<li><a href="secrets/">secrets/</a></li>
</ul>
<hr>
</body>
</html>
nobody@pop-os:/$ 

4) I install docker on this host.

Now, since docker mangles iptables on the host, here's a snapshot of all the rules existing on this host:

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain NOMAD-ADMIN (0 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 ACCEPT     all  --  *      nomad   0.0.0.0/0            172.26.64.0/20      

After I install docker, the above curl command stops working:

nobody@pop-os:/$ curl -m 5 192.168.29.76:24858 
curl: (28) Connection timed out after 5004 milliseconds
nobody@pop-os:/$ 

The iptables rules list after docker is installed:

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 DOCKER-USER  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
    0     0 DOCKER-ISOLATION-STAGE-1  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
    0     0 ACCEPT     all  --  *      docker0  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
    0     0 DOCKER     all  --  *      docker0  0.0.0.0/0            0.0.0.0/0           
    0     0 ACCEPT     all  --  docker0 !docker0  0.0.0.0/0            0.0.0.0/0           
    0     0 ACCEPT     all  --  docker0 docker0  0.0.0.0/0            0.0.0.0/0           
    0     0 ACCEPT     all  --  *      br-886858651ca3  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
    0     0 DOCKER     all  --  *      br-886858651ca3  0.0.0.0/0            0.0.0.0/0           
    0     0 ACCEPT     all  --  br-886858651ca3 !br-886858651ca3  0.0.0.0/0            0.0.0.0/0           
    0     0 ACCEPT     all  --  br-886858651ca3 br-886858651ca3  0.0.0.0/0            0.0.0.0/0           
   12  1747 CNI-FORWARD  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* CNI firewall plugin rules */

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain CNI-FORWARD (1 references)
 pkts bytes target     prot opt in     out     source               destination         
   12  1747 NOMAD-ADMIN  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* CNI firewall plugin admin overrides */
    0     0 ACCEPT     all  --  *      *       0.0.0.0/0            172.26.64.87         ctstate RELATED,ESTABLISHED
    0     0 ACCEPT     all  --  *      *       172.26.64.87         0.0.0.0/0           

Chain DOCKER (2 references)
 pkts bytes target     prot opt in     out     source               destination         

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 DOCKER-ISOLATION-STAGE-2  all  --  docker0 !docker0  0.0.0.0/0            0.0.0.0/0           
    0     0 DOCKER-ISOLATION-STAGE-2  all  --  br-886858651ca3 !br-886858651ca3  0.0.0.0/0            0.0.0.0/0           
    0     0 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0           

Chain DOCKER-ISOLATION-STAGE-2 (2 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 DROP       all  --  *      docker0  0.0.0.0/0            0.0.0.0/0           
    0     0 DROP       all  --  *      br-886858651ca3  0.0.0.0/0            0.0.0.0/0           
    0     0 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0           

Chain DOCKER-USER (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0           

Chain NOMAD-ADMIN (1 references)
 pkts bytes target     prot opt in     out     source               destination         
   12  1747 ACCEPT     all  --  *      nomad   0.0.0.0/0            172.26.64.0/20   

Reproduction steps

Detailed steps are above already.

Here's a TL;DR:

More context

I wonder if docker is putting some kind of iptables rule on the host network interface which makes it unreachable from the nomad network interface ? Which is why as soon as docker is installed on the host, the address is unreachable.

IP routes on the host:

default via 192.168.29.1 dev wlp0s20f3 proto dhcp metric 600 
169.254.0.0/16 dev wlp0s20f3 scope link metric 1000 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 
172.18.0.0/16 dev br-886858651ca3 proto kernel scope link src 172.18.0.1 linkdown 
172.26.64.0/20 dev nomad proto kernel scope link src 172.26.64.1 
192.168.29.0/24 dev wlp0s20f3 proto kernel scope link src 192.168.29.76 metric 600 

IP routes on the alloc:

nobody@pop-os:/$ ip route
default via 172.26.64.1 dev eth0 
172.26.64.0/20 dev eth0 proto kernel scope link src 172.26.64.87 
nobody@pop-os:/$ 

(This I believe is the default subnet that nomad uses).

Question:

What I want to achieve is to be able to reach the application from inside the alloc exec for quick debugging/tests. What is the best way to achieve that/which address/interface should I be using in that case? I've tried the lo/0.0.0.0/nomad but none seem to work. This is unlike docker driver where the application binds to 127.0.0.1 in the container itself so it's reachable, so how exactly would this work in exec?

Thanks!

tgross commented 2 years ago

Hi @mr-karan! The bridge networking setup on the Nomad client appends to iptables (ref networking_bridge_linux.go#L108-L115). This is deliberate and introduced in da27dafdf0f9dce668a03f28987c5806ffb9eda4 so that cluster administrators can add their own rules to the chain. But if you install Docker after you've run a Nomad client that needs the bridge, then the order of those rules is going to be unexpected.

That being said, I tried to reproduce this and weirdly I can't even hit localhost inside the application's network namespace! The allocation is reachable from outside the namespace just fine, just not inside.

exec jobspec ```hcl job "exec" { datacenters = ["dc1"] group "web" { network { mode = "bridge" port "www" { to = 8001 } } service { name = "www" port = "www" provider = "nomad" } task "httpd" { driver = "exec" config { command = "busybox" args = ["httpd", "-v", "-f", "-p", "0.0.0.0:8001", "-h", "/local"] } template { data = <
host routes and iptables ``` $ ip route default via 10.0.2.2 dev enp0s3 proto dhcp src 10.0.2.15 metric 100 10.0.2.0/24 dev enp0s3 proto kernel scope link src 10.0.2.15 10.0.2.2 dev enp0s3 proto dhcp scope link src 10.0.2.15 metric 100 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 172.26.64.0/20 dev nomad proto kernel scope link src 172.26.64.1 192.168.56.0/24 dev enp0s10 proto kernel scope link src 192.168.56.69 192.168.56.0/24 dev enp0s8 proto kernel scope link src 192.168.56.69 192.168.56.0/24 dev enp0s9 proto kernel scope link src 192.168.56.69 $ sudo iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy DROP) target prot opt source destination CNI-FORWARD all -- anywhere anywhere /* CNI firewall plugin rules */ DOCKER-USER all -- anywhere anywhere DOCKER-ISOLATION-STAGE-1 all -- anywhere anywhere ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED DOCKER all -- anywhere anywhere ACCEPT all -- anywhere anywhere ACCEPT all -- anywhere anywhere Chain OUTPUT (policy ACCEPT) target prot opt source destination Chain CNI-FORWARD (1 references) target prot opt source destination NOMAD-ADMIN all -- anywhere anywhere /* CNI firewall plugin admin overrides */ ACCEPT all -- anywhere 172.26.64.7 ctstate RELATED,ESTABLISHED ACCEPT all -- 172.26.64.7 anywhere Chain DOCKER (1 references) target prot opt source destination Chain DOCKER-ISOLATION-STAGE-1 (1 references) target prot opt source destination DOCKER-ISOLATION-STAGE-2 all -- anywhere anywhere RETURN all -- anywhere anywhere Chain DOCKER-ISOLATION-STAGE-2 (1 references) target prot opt source destination DROP all -- anywhere anywhere RETURN all -- anywhere anywhere Chain DOCKER-USER (1 references) target prot opt source destination RETURN all -- anywhere anywhere Chain NOMAD-ADMIN (1 references) target prot opt source destination ACCEPT all -- anywhere 172.26.64.0/20 ```

Run the job and check the allocation address:

$ nomad alloc status 9cfbb1a8
...
Allocation Addresses (mode = "bridge")
Label  Dynamic  Address
*www   yes      127.0.0.1:27053 -> 8001
...

$ curl 127.0.0.1:27053

hello from 127.0.0.1:27053

The CNI logs on the client look as expected:

2022-06-08T18:22:04.376Z [DEBUG] client.alloc_runner.runner_hook: received result from CNI: alloc_id=9cfbb1a8-bd4b-761b-e43e-41ff5ad0e48a result="{\"Interfaces\":{\"eth0\":{\"IPConfigs\":[{\"IP\":\"172.26.64.7\",\"Gateway\":\"172.26.64.1\"}],\"Mac\":\"5e:cf:1d:d1:0b:47\",\"Sandbox\":\"/var/run/netns/9cfbb1a8-bd4b-761b-e43e-41ff5ad0e48a\"},\"nomad\":{\"IPConfigs\":null,\"Mac\":\"12:d8:21:27:7d:af\",\"Sandbox\":\"\"},\"veth8a0eab86\":{\"IPConfigs\":null,\"Mac\":\"b6:37:96:04:32:a6\",\"Sandbox\":\"\"}},\"DNS\":[{}],\"Routes\":[{\"dst\":\"0.0.0.0/0\"}]}"

We can compare to a docker task:

2022-06-08T18:27:46.790Z [DEBUG] client.alloc_runner.runner_hook: received result from CNI: alloc_id=82b89d12-7ecd-2c42-d2a0-31bdff8c46ea result="{\"Interfaces\":{\"eth0\":{\"IPConfigs\":[{\"IP\":\"172.26.64.9\",\"Gateway\":\"172.26.64.1\"}],\"Mac\":\"fa:86:74:7a:7b:64\",\"Sandbox\":\"/var/run/docker/netns/8ce2419da8fa\"},\"nomad\":{\"IPConfigs\":null,\"Mac\":\"12:d8:21:27:7d:af\",\"Sandbox\":\"\"},\"vethe2a5e1b4\":{\"IPConfigs\":null,\"Mac\":\"da:88:15:d9:ac:7b\",\"Sandbox\":\"\"}},\"DNS\":[{}],\"Routes\":[{\"dst\":\"0.0.0.0/0\"}]}"

Now let's look inside the network namespace of this task:

$ sudo nsenter -t $(pgrep busybox) --net ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0@if13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 5e:cf:1d:d1:0b:47 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.26.64.7/20 brd 172.26.79.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5ccf:1dff:fed1:b47/64 scope link
       valid_lft forever preferred_lft forever

$ sudo nsenter -t $(pgrep busybox) --net netstat -antp
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:8001            0.0.0.0:*               LISTEN      8910/busybox

$ sudo nsenter -t $(pgrep busybox) --net ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0@if13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 5e:cf:1d:d1:0b:47 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.26.64.7/20 brd 172.26.79.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5ccf:1dff:fed1:b47/64 scope link
       valid_lft forever preferred_lft forever

Ok that all looks good. Let's make sure we have that eth0@if13 veth interface on the host:

$ ip addr
...
       13: veth72eb50f3@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master nomad state UP group default
    link/ether b6:37:96:04:32:a6 brd ff:ff:ff:ff:ff:ff link-netns 912f1503-3819-2412-fc6b-8abc42faca79
    inet6 fe80::b437:96ff:fe04:32a6/64 scope link
       valid_lft forever preferred_lft forever

So far so good, let's curl various address/port combinations from inside the allocation's network namespace:

# container IP
$ sudo nsenter -t $(pgrep busybox) --net curl -m1 172.26.64.7:27053
curl: (28) Connection timed out after 1002 milliseconds
$ sudo nsenter -t $(pgrep busybox) --net curl -m1 172.26.64.7:8001
curl: (28) Connection timed out after 1000 milliseconds

# nomad bridge IP
$ sudo nsenter -t $(pgrep busybox) --net curl -m1 172.26.64.1:8001
curl: (7) Failed to connect to 172.26.64.1 port 8001: Connection refused
$ sudo nsenter -t $(pgrep busybox) --net curl -m1 172.26.64.1:27053
curl: (28) Connection timed out after 1001 milliseconds

# localhost
$ sudo nsenter -t $(pgrep busybox) --net curl -m1 127.0.0.1:27053
curl: (28) Connection timed out after 1001 milliseconds
$ sudo nsenter -t $(pgrep busybox) --net curl -m1 127.0.0.1:8001
curl: (28) Connection timed out after 1001 milliseconds

# what about the host IP?
$ sudo nsenter -t $(pgrep busybox) --net curl -m1 192.168.56.69:27053
curl: (28) Connection timed out after 1001 milliseconds
$ sudo nsenter -t $(pgrep busybox) --net curl -m1 ^C
$ sudo nsenter -t $(pgrep busybox) --net curl -m1 192.168.56.69:8001
curl: (7) Failed to connect to 192.168.56.69 port 8001: Connection refused

# docker0 bridge IP (just in case)
$ sudo nsenter -t $(pgrep busybox) --net curl -m1 172.17.0.1:27053
curl: (28) Connection timed out after 1001 milliseconds
$ sudo nsenter -t $(pgrep busybox) --net curl -m1 172.17.0.1:8001
curl: (7) Failed to connect to 172.17.0.1 port 8001: Connection refused

None of these work!

I'm going to mark this as a bug for further investigation. In the meantime, if you want to test that the application is reachable you should probably be using the host address from outside the application container anyways, as it'll give you a more accurate picture about how networking is set up.

mr-karan commented 2 years ago

In the meantime, if you want to test that the application is reachable you should probably be using the host address from outside the application container anyways, as it'll give you a more accurate picture about how networking is set up.

As a stop gap solution this is okay. Although I’d like to point that it’s not really practical if the node has multiple namespaces which are managed by different people, giving everyone the SSH access (so they login and do these curl commands) to underlying nodes isn’t feasible.

tgross commented 2 years ago

Although I’d like to point that it’s not really practical if the node has multiple namespaces which are managed by different people, giving everyone the SSH access (so they login and do these curl commands) to underlying nodes isn’t feasible.

Yeah if you don't intend for the application to be visible outside the host that's definitely a constraint.