Multi-Interface doesn't work if using network_mode stanza in docker driven task

Davasny commented 3 years ago

If filing a bug please include the following:

Nomad version

Nomad v1.0.3 (08741d9f2003ec26e44c72a2c0e27cdf0eadb6ee)

Operating system and Environment details

# uname -a
Linux centos8.localdomain 4.18.0-193.19.1.el8_2.x86_64 #1 SMP Mon Sep 14 14:37:00 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

# docker -v
Docker version 19.03.13, build 4484c46d9d

# cat /etc/centos-release
CentOS Linux release 8.2.2004 (Core)

Issue

When using multi-interface feature it's possible to access container port via public IP, but after enabling network_mode in task config, container becomes unaccessible.

Reproduction steps

Start nomad client and server with multi-interface config
Run nomad job with commented out network_mode = "test"
Try to access container, in my case with curl

# curl -I 192.168.88.248:8001
HTTP/1.0 200 OK
Server: SimpleHTTP/0.6 Python/3.8.7
Date: Wed, 10 Feb 2021 09:56:37 GMT
Content-type: text/html; charset=utf-8
Content-Length: 1030

Uncomment network_mode = "test" and rerun the job
Try to access container

# curl 192.168.88.248:8001
curl: (7) Failed to connect to 192.168.88.248 port 8001: Connection refused

Network config

# ip -br a
lo               UNKNOWN        127.0.0.1/8 ::1/128
eth0             UP             192.168.88.250/24 fe80::215:5dff:fe0b:ac0f/64
eth1             UP             192.168.88.248/24 fe80::cf0d:e1a4:47ca:2036/64
docker0          DOWN           172.17.0.1/16 fe80::42:93ff:fea0:c016/64
nomad            UP             172.26.64.1/20 fe80::5042:c1ff:fe01:8283/64
br-517e1d2d528e  UP             172.19.0.1/16 fe80::42:e8ff:fe51:3ae4/64

Job file (if appropriate)

job "test" {
  datacenters = ["dc1"]
  type        = "service"

  group "test" {
    count = 1

    network {
      mode = "bridge"

      port "http" {
        host_network = "public"
        static = 8001
        to = 8000
      }
    }

    service {
      address_mode = "host"
      name         = "test"
      port         = "http"
    }

    task "test" {
      driver = "docker"

      config {
        image = "python:3.8-alpine"
        # network_mode = "test"
        args  = ["python", "-m", "http.server"]
      }
    }
  }
}

Nomad config

data_dir = "/var/lib/nomad"

log_level = "DEBUG"

client {
  enabled = 1
  servers = ["127.0.0.1"]

  network_interface = "eth0"

  host_network "public" {
    cidr = "192.168.88.0/24"
    interface = "eth1"
  }
}

plugin "docker" {
  config {
    volumes {
      enabled = true
    }
  }
}

server {
  bootstrap_expect = 1
  enabled = 1
  server_join {
    retry_join = ["127.0.0.1"]
  }
}

Alloc status after step 5 in reproduction

nomad alloc status 1ab49ded
ID                  = 1ab49ded-e7c2-109a-ddbd-4fa8506fb499
Eval ID             = 3cda7537
Name                = test.test[0]
Node ID             = 351e050e
Node Name           = centos8.localdomain
Job ID              = test
Job Version         = 1
Client Status       = running
Client Description  = Tasks are running
Desired Status      = run
Desired Description = <none>
Created             = 13m3s ago
Modified            = 12m46s ago
Deployment ID       = 3294659a
Deployment Health   = healthy

Allocation Addresses (mode = "bridge")
Label  Dynamic  Address
*http  yes      192.168.88.248:8001 -> 8000

Task "test" is "running"
Task Resources
CPU        Memory           Disk     Addresses
0/100 MHz  9.6 MiB/300 MiB  300 MiB

Task Events:
Started At     = 2021-02-10T09:58:55Z
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                       Type        Description
2021-02-10T10:58:55+01:00  Started     Task started by client
2021-02-10T10:58:54+01:00  Task Setup  Building Task Directory
2021-02-10T10:58:48+01:00  Received    Task received by client

Clone of: https://discuss.hashicorp.com/t/question-how-to-run-task-in-multi-interface-configuration-with-access-to-docker-network/20768

Same issue had @urusha in #8432

urusha commented 3 years ago

Confirming. Specifying network_mode in the job config makes docker use specified bridge. While network stanza configures iptables rules like if the container is in nomad bridge. So port mapping doesn't work. We'd like to have both ability to specify the bridge (network_mode) and the interface for portmapping (host_network). I guess cni bridge plugin is able to use multiple bridges, I guess it even possible to use native docker bridges for this purpose. Example configuration with the bridge created with the command:

docker network create --subnet 172.19.0.0/16 --gateway=172.19.0.1 --ip-range 172.19.0.0/24 --driver=bridge -ocom.docker.network.bridge.name=br-test test

Might look like this:

# job group config
    network {
      mode = "bridge"
      bridge_name = "br-test"
      port ....
    }

# nomad client config
  cni_bridge "br-test" {
      device = "br-test"
      subnet = "172.19.0.1/16"
      range = "172.19.1.0/24"
  }

Such configuration would allow using cni-bridge with the docker's bridge, since address ranges from the docker and from the nomad(cni) don't overlap (172.19.0.0/24 vs 172.19.1.0/24). This would allow keeping containers with port-maping and without port-mapping in the same subnet (network_mode). From the other side - if we could specify multiple cni bridges we would not need docker native bridges.

tgross commented 3 years ago

Hi @Davasny and @urusha!

I've verified this as well. On my test machine here I've configured eth1 as the public network for Nomad. Then I created a network bridge via docker network create --subnet=192.168.17.0/24 test, which results in the following bridge configurations:

$ ip -br a
lo               UNKNOWN        127.0.0.1/8 ::1/128
eth0             UP             10.0.2.15/24 fe80::a00:27ff:fec5:bc64/64
eth1             UP             10.199.0.200/24 fe80::a00:27ff:fe56:7ad1/64
docker0          DOWN           172.17.0.1/16
br-d7837cb43544  UP             192.168.17.1/24 fe80::42:53ff:fe79:4d38/64
nomad            UP             172.26.64.1/20 fe80::6cf4:33ff:fe3b:710c/64
veth6e107668@eth1 UP             fe80::b81d:44ff:fece:fee2/64
vethc6e08f0@if12 UP             fe80::5c4f:f5ff:fe37:4ab0/64

I ran @Davasny's job and got the following output of docker inspect :id | jq '.[0].NetworkSettings. Which gives us an address on the test network as we'd expect, but that's not what Nomad is configuring for its network.

```json { "Bridge": "", "SandboxID": "f437c26be10275780d63a6401abd08acf7201dfba97eebe2f70880806b2fd7a6", "HairpinMode": false, "LinkLocalIPv6Address": "", "LinkLocalIPv6PrefixLen": 0, "Ports": {}, "SandboxKey": "/var/run/docker/netns/f437c26be102", "SecondaryIPAddresses": null, "SecondaryIPv6Addresses": null, "EndpointID": "", "Gateway": "", "GlobalIPv6Address": "", "GlobalIPv6PrefixLen": 0, "IPAddress": "", "IPPrefixLen": 0, "IPv6Gateway": "", "MacAddress": "", "Networks": { "test": { "IPAMConfig": null, "Links": null, "Aliases": [ "628398ff5f5b" ], "NetworkID": "d7837cb435447312314c7b7c3b9169a19a687458e6dac5490a77390ef61e94ab", "EndpointID": "669a22e9c95aca0b61f2049613619a4606e7f566f3f5fc93eda6027bab375fde", "Gateway": "192.168.17.1", "IPAddress": "192.168.17.2", "IPPrefixLen": 24, "IPv6Gateway": "", "GlobalIPv6Address": "", "GlobalIPv6PrefixLen": 0, "MacAddress": "02:42:c0:a8:11:02", "DriverOpts": null } } } ```

However I'm sorry to say I think you've run into a known limitation. From the network docs on host networks:

Note: host_network does not currently support task-based mapped ports such as the Docker driver's port_map configuration.

The example there could be more comprehensive but this looks like another case of the same issue. I'd have thought we'd have an issue open for it, but other than other reports like https://github.com/hashicorp/nomad/issues/10001 and https://github.com/hashicorp/nomad/issues/9006 it doesn't look like it. I'll circle up with the original author of this feature to make sure I understand whether the limitation is inherent or just "not done yet". And in the meanwhile, I'll try to make sure this gets surfaced for our roadmapping to fix.

tgross commented 3 years ago

Ok, had a chat with some folks internally here and this is indeed a known limitation. When you set the network_mode in the Docker configuration, you're asking Docker to define the network namespace, but that's also what the network block is doing. With Docker tasks, Nomad creates a pause container and sets the network namespace associated with the network block to that of the container (ex. container:abcde1134).

So there's a documentation bug here to fix. The snippet of documentation I quoted above is also wrong when it comes to port mapping; that should all work now. Going to mark this as a docs bug and will ship a PR to clean this all up.

tgross commented 3 years ago

Documentation fixes are merged and will go out with the next website push.

github-actions[bot] commented 1 year ago

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

hashicorp / nomad