hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.61k stars 1.92k forks source link

CNI: provide `cni.Result` to task driver #16624

Open tgross opened 1 year ago

tgross commented 1 year ago

Task drivers are not currently supplied much information about the network specification produced by CNI plugins. For example, consider the following CNI configuration:

{
  "name": "test",
  "cniVersion": "0.4.0",
  "plugins": [
    {
      "type": "bridge",
      "name": "my-bridge",
      "bridge": "virbr0",
      "isDefaultGateway": true,
      "ipMasq": true,
      "hairpinMode": true,
      "ipam": {
        "type": "host-local",
        "resolvConf": "/etc/resolv.conf",
        "dataDir": "/run/networks",
        "subnet": "192.168.40.0/24",
        "rangeStart": "192.168.40.32",
        "gateway": "192.168.40.1"
      }
    }
  ]
}

I've used this with a simple raw_exec task for purposes of illustration:

jobspec ```hcl job "example" { group "web" { network { mode = "cni/test" } task "http" { driver = "raw_exec" config { command = "python3" args = [ "-m", "http.server", "--directory", "local", ] } } } } ```

The network manager reads the mode cni/test (ref network_manager_linux.go#L192-L197 and calls newCNINetworkConfigurator in networking_cni.go, which ultimately calls Setup. The debug log line we get here has plenty of detail about the resulting spec:

2023-03-23T14:37:04.978-0400 [DEBUG] client.alloc_runner.runner_hook: received result from CNI: alloc_id=078c18df-01cc-d337-6d82-4b0b5d9984d0 result="{\"Interfaces\":{\"eth0\":{\"IPConfigs\":[{\"IP\":\"192.168.40.36\",\"Gateway\":\"192.168.40.1\"}],\"Mac\":\"da:cc:07:05:a4:9d\",\"Sandbox\":\"/var/run/netns/078c18df-01cc-d337-6d82-4b0b5d9984d0\"},\"vethd60fa44f\":{\"IPConfigs\":null,\"Mac\":\"fe:26:66:49:d8:2d\",\"Sandbox\":\"\"},\"virbr0\":{\"IPConfigs\":null,\"Mac\":\"52:54:00:60:7f:fc\",\"Sandbox\":\"\"}},\"DNS\":[{}],\"Routes\":[{\"dst\":\"0.0.0.0/0\",\"gw\":\"192.168.40.1\"}]}"

But if we add a spew.Dump to the top of the task driver's StartTask, we get the following NetworkIsolation block:

NetworkIsolation: (*drivers.NetworkIsolationSpec)(0xc001a6cd20)({
  Mode: (drivers.NetIsolationMode) (len=5) "group",
  Path: (string) (len=51) "/var/run/netns/2ce20001-f442-73c0-60ab-2e74367fe444",
  Labels: (map[string]string) {
  },
  HostsConfig: (*drivers.HostsConfig)(<nil>)
 }),

While this works fine for our built-in task drivers, @eveld has reported a need for custom task drivers to know about how the network was configured, so that they can set appropriate values in the task (for example, setting the IP in a Firecracker VM's MMDS).

We currently pass some of this information along for bridge networking if the driver has set some Docker-specific tags (ref network_hook.go#L132-L175, but this doesn't work for CNI and in any case shouldn't be Docker-specific.

eveld commented 1 year ago

The firecracker-go-sdk can either set up CNI itself or can use a static interface which allows us to pass on the details of the interfaces that Nomad created.

In my case, I need:

 NetworkInterfaces: []firecracker.NetworkInterface{
             {
                 StaticConfiguration: &firecracker.StaticNetworkConfiguration{
                     HostDevName: "<host tap device>",
                     MacAddress:  "<some mac address>",

                     IPConfiguration: &firecracker.IPConfiguration{
                         IfName: "eth0",
                         IPAddr: net.IPNet{
                             IP:   <interface ip>
                             Mask: <ip mask>,
                         },
                         Gateway: <gateway ip>,
                     },
                 },
                 AllowMMDS: true,
             },

The cni.Result has all the details I would need to configure that. Having that exposed to the driver would solve my dilemma. Perhaps other drivers have a similar need.

spaghettifunk commented 8 months ago

I've been fighting with these for a few days and I'm on the verge of giving up. I believe what @eveld is proposing would be beneficial. Happy to open a PR if this is possible 😄

tgross commented 5 months ago

While working on https://github.com/hashicorp/nomad/issues/10628 I bumped into this in terms of missing DNS configuration, as described in https://github.com/hashicorp/nomad/issues/11102. Threading the cni.Result down into the task driver might be the best solution to this or we could add the missing fields to the existing fields on drivers.TaskConfig