hashicorp / nomad-driver-exec2

An official Nomad task driver plugin for sandboxing workloads using native Linux kernel features
Mozilla Public License 2.0
8 stars 0 forks source link

Feature: support for DNS resolution when using network mode=bridge #48

Open EtienneBruines opened 3 months ago

EtienneBruines commented 3 months ago

It would be nice if network mode bridge was supported.

The current state: Unable to lookup hostnames via DNS.

An error message from my Go binary that attempted to run:

dial tcp: lookup google.com on 127.0.0.53:53: read udp 127.0.0.1:55584->127.0.0.53:53: read: connection refused

This would make using Consul Connect / sidecar proxy / transparent proxy possible.

shoenig commented 1 month ago

So it turns out this is kind of expected, but obviously not ideal. The same problem exists for each of the exec, exec2, and raw_exec task drivers - none of them are able to do DNS resolution "out of the box" when using bridge networking mode. We have nomad tickets tracking this in https://github.com/hashicorp/nomad/issues/17873 and https://github.com/hashicorp/nomad/issues/11033

For the original exec driver it is sufficient to set the network.dns.servers value to a public DNS server, like 8.8.8.8 and simply bypass the host DNS infrastructure.

But for exec2 and raw_exec this does not work, because they will continue to read the host's original /etc/resolv.conf file, which on most systems today are managed by systemd-resolvd and will only point to 127.0.0.53. The problem, of course, is that this host loopback address is not accessible in the network namespace created for the bridge network mode.

As a workaround, it is possible to simply run a dnsmasq process in the task group making use of bridge mode and configure its upstream server to something public.

config {
  command = "dnsmasq"
  args    = ["--no-daemon", "--user=nobody", "--listen-address", "127.0.0.53", "--no-resolv", "--server", "1.1.1.1"]
  unveil  = ["rx:/usr/sbin/dnsmasq", "r:/etc/passwd"]
}

Below is a full example using dnsmasq as a prestart sidecar task, and then using curl to reach google.com.

job "dns" {
  type = "batch"

  group "group" {
    network {
      mode = "bridge"
    }

    task "dns" {
      driver = "exec2"
      user   = "root"

      config {
        command = "dnsmasq"
        args    = ["--no-daemon", "--log-debug", "--user=nobody", "--listen-address", "127.0.0.53", "--no-resolv", "--server", "1.1.1.1"]
        unveil  = ["rx:/usr/sbin/dnsmasq", "r:/etc/passwd"]
      }

      lifecycle {
        hook    = "prestart"
        sidecar = true
      }

      resources {
        cpu    = 300
        memory = 32
      }
    }

    task "curl" {
      driver = "exec2"

      config {
        command = "bash"
        args    = ["-c", "sleep 5 && curl google.com"]
        unveil  = ["rx:/usr/bin"]
      }

      resources {
        cpu    = 300
        memory = 32
      }
    }

    restart {
      attempts = 0
      mode     = "fail"
    }
  }
}
EtienneBruines commented 1 month ago

Interesting! I can see that network access (e.g. ping to 1.1.1.1) does indeed already work. Thank you for the explanation!

That already-implemented support also seem to cover Consul Connect (sidecars), to the same extent. Transparent proxies not working due to DNS resolution, but manually specifying upstreams does work, since the exec2 task can then simply connect to a loopback port.