hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.87k stars 1.95k forks source link

0.10.0-beta1: Network Namespace missing "bridge" plugin #6319

Closed djenriquez closed 5 years ago

djenriquez commented 5 years ago

We're trying out the network namespace in hopes of moving away from the network_mode: container which currently has issues with task dependencies (sometimes the service comes up before the sidecar and the task fails since docker can't attach the container to a non-existent container's network). We need to be able to route to localhost in order to reach our sidecar proxy from our app service. Sounds like since network namespace allow traffic over loopback, this would have been the solution.

It appears our machines are missing a binary of sorts? Should this dependency be added to the changelog? Maybe I'm doing something wrong?

Also, since I'm here, given the problem we're trying to solve above, will network namespace be the solution we are hoping it to be?

Nomad version

0.10.0-beta1 (Server + Clients)

Operating system and Environment details

Amazon Linux 2

Issue

Running a job def with a taskgroup network namespace but failing with:

failed to setup alloc: pre-run hook "network" failed: failed to configure networking for alloc: failed to find plugin "bridge" in path [/opt/cni/bin]

Reproduction steps

Run the job below on a 0.10.0-beta1 nomad server on Amz-Linux-2 based server.

Job file (if appropriate)

{
  "Stop": false,
  "Region": "us-west-2",
  "Namespace": "default",
  "ID": "<REDACTED>-jobmaster",
  "ParentID": "",
  "Name": "<REDACTED>-jobmaster",
  "Type": "service",
  "Priority": 60,
  "AllAtOnce": false,
  "Datacenters": [
    "sandbox"
  ],
  "Constraints": null,
  "Affinities": null,
  "Spreads": null,
  "TaskGroups": [
    {
      "Name": "<REDACTED>-jobmaster-listener",
      "Count": 2,
      "Update": {
        "Stagger": 30000000000,
        "MaxParallel": 1,
        "HealthCheck": "checks",
        "MinHealthyTime": 10000000000,
        "HealthyDeadline": 200000000000,
        "ProgressDeadline": 600000000000,
        "AutoRevert": true,
        "AutoPromote": false,
        "Canary": 0
      },
      "Migrate": {
        "MaxParallel": 1,
        "HealthCheck": "checks",
        "MinHealthyTime": 10000000000,
        "HealthyDeadline": 300000000000
      },
      "Constraints": [
        {
          "LTarget": "${node.class}",
          "RTarget": "app",
          "Operand": "="
        },
        {
          "LTarget": "${attr.vault.version}",
          "RTarget": ">= 0.6.1",
          "Operand": "version"
        }
      ],
      "RestartPolicy": {
        "Attempts": 3,
        "Interval": 300000000000,
        "Delay": 60000000000,
        "Mode": "fail"
      },
      "Tasks": [
        {
          "Name": "<REDACTED>-jobmaster-listener",
          "Driver": "docker",
          "User": "",
          "Config": {
            "force_pull": true,
            "ulimit": [
              {
                "nofile": "100000:100000"
              }
            ],
            "port_map": [
              {
                "jobmaster-listener": 8080
              }
            ],
            "logging": [
              {
                "config": [
                  {
                    "syslog-format": "rfc5424micro",
                    "tag": "<REDACTED>_jobmaster-listener_<REDACTED>/http-ping:latest_{{.ID}}",
                    "syslog-address": "udp://${attr.unique.network.ip-address}:514"
                  }
                ],
                "driver": "syslog"
              }
            ],
            "args": [
              "--text='hello dev world'"
            ],
            "image": "<REDACTED>/http-ping:latest"
          },
          "Env": {
            "TEST_C": "100",
            "TEST_D": "100",
            "ENV": "sandbox",
            "TEST_B": "YES"
          },
          "Services": null,
          "Vault": {
            "Policies": [
              "<REDACTED>-sandbox-jobmaster-listener"
            ],
            "Env": true,
            "ChangeMode": "restart",
            "ChangeSignal": "SIGHUP"
          },
          "Templates": [
            {
              "SourcePath": "",
              "DestPath": "secrets/rendered.env",
              "EmbeddedTmpl": "MYSQL_DB_PW=\"{{with secret \"services/data/jobmaster/test\"}}{{.Data.data.password}}{{end}}\"\nMYSQL_DB_USER=\"{{with secret \"services/data/jobmaster/test\"}}{{.Data.data.username}}{{end}}\"\nMYSQL_DB_HOST=\"{{with secret \"services/data/jobmaster/test\"}}{{.Data.data.host}}{{end}}\"\n",
              "ChangeMode": "restart",
              "ChangeSignal": "",
              "Splay": 5000000000,
              "Perms": "0644",
              "LeftDelim": "{{",
              "RightDelim": "}}",
              "Envvars": true,
              "VaultGrace": 15000000000
            }
          ],
          "Constraints": null,
          "Affinities": null,
          "Resources": {
            "CPU": 100,
            "MemoryMB": 200,
            "DiskMB": 0,
            "IOPS": 0,
            "Networks": [
              {
                "Mode": "bridge",
                "Device": "",
                "CIDR": "",
                "IP": "",
                "MBits": 10,
                "ReservedPorts": null,
                "DynamicPorts": [
                  {
                    "Label": "jobmaster-listener",
                    "Value": 0,
                    "To": 0
                  }
                ]
              }
            ],
            "Devices": null
          },
          "DispatchPayload": null,
          "Meta": null,
          "KillTimeout": 1000000000,
          "LogConfig": {
            "MaxFiles": 10,
            "MaxFileSizeMB": 10
          },
          "Artifacts": null,
          "Leader": false,
          "ShutdownDelay": 10000000000,
          "VolumeMounts": null,
          "KillSignal": "",
          "Kind": ""
        },
        {
          "Name": "<REDACTED>-jobmaster-listener-proxy",
          "Driver": "docker",
          "User": "",
          "Config": {
            "ulimit": [
              {
                "nofile": "100000:100000"
              }
            ],
            "port_map": [
              {
                "jobmaster-listener-proxy": 81
              }
            ],
            "dns_servers": [
              "${attr.unique.network.ip-address}",
              "${meta.ec2_dns}"
            ],
            "extra_hosts": [
              "jobmaster-worker:127.0.0.1"
            ],
            "args": [
              "-l",
              "debug"
            ],
            "image": "<REDACTED>/envoy-sidecar:latest"
          },
          "Env": {
            "NODE_ID": "jobmaster-listener-${NOMAD_ALLOC_ID}-proxy",
            "SERVICE_NAME": "jobmaster-listener",
            "SERVICE_PORT": "8080",
            "LISTENER_PROTOCOL": "http",
            "NAMESPACE": "<REDACTED>",
            "XDS_ADDRESS": "${attr.unique.network.ip-address}"
          },
          "Services": [
            {
              "Name": "jobmaster-listener",
              "PortLabel": "jobmaster-listener-proxy",
              "AddressMode": "auto",
              "Tags": [
              ],
              "CanaryTags": null,
              "Checks": [
                {
                  "Name": "alive",
                  "Type": "tcp",
                  "Command": "",
                  "Args": null,
                  "Path": "",
                  "Protocol": "",
                  "PortLabel": "jobmaster-listener-proxy",
                  "AddressMode": "",
                  "Interval": 20000000000,
                  "Timeout": 3000000000,
                  "InitialStatus": "warning",
                  "TLSSkipVerify": false,
                  "Method": "",
                  "Header": null,
                  "CheckRestart": null,
                  "GRPCService": "",
                  "GRPCUseTLS": false,
                  "TaskName": ""
                },
                {
                  "Name": "available",
                  "Type": "http",
                  "Command": "",
                  "Args": null,
                  "Path": "/",
                  "Protocol": "",
                  "PortLabel": "jobmaster-listener-proxy",
                  "AddressMode": "",
                  "Interval": 30000000000,
                  "Timeout": 20000000000,
                  "InitialStatus": "warning",
                  "TLSSkipVerify": false,
                  "Method": "GET",
                  "Header": {
                    "Host": [
                      "jobmaster-listener"
                    ]
                  },
                  "CheckRestart": null,
                  "GRPCService": "",
                  "GRPCUseTLS": false,
                  "TaskName": ""
                }
              ],
              "Connect": null,
              "Meta": null
            }
          ],
          "Vault": null,
          "Templates": null,
          "Constraints": null,
          "Affinities": null,
          "Resources": {
            "CPU": 256,
            "MemoryMB": 128,
            "DiskMB": 0,
            "IOPS": 0,
            "Networks": [
              {
                "Mode": "bridge",
                "Device": "",
                "CIDR": "",
                "IP": "",
                "MBits": 10,
                "ReservedPorts": null,
                "DynamicPorts": [
                  {
                    "Label": "jobmaster-listener-proxy",
                    "Value": 0,
                    "To": 0
                  }
                ]
              }
            ],
            "Devices": null
          },
          "DispatchPayload": null,
          "Meta": null,
          "KillTimeout": 300000000000,
          "LogConfig": {
            "MaxFiles": 10,
            "MaxFileSizeMB": 10
          },
          "Artifacts": null,
          "Leader": true,
          "ShutdownDelay": 10000000000,
          "VolumeMounts": null,
          "KillSignal": "",
          "Kind": ""
        }
      ],
      "EphemeralDisk": {
        "Sticky": false,
        "SizeMB": 300,
        "Migrate": false
      },
      "Meta": {
      },
      "ReschedulePolicy": {
        "Attempts": 0,
        "Interval": 0,
        "Delay": 15000000000,
        "DelayFunction": "exponential",
        "MaxDelay": 600000000000,
        "Unlimited": true
      },
      "Affinities": null,
      "Spreads": null,
      "Networks": [
        {
          "Mode": "bridge",
          "Device": "",
          "CIDR": "",
          "IP": "",
          "MBits": 10,
          "ReservedPorts": null,
          "DynamicPorts": null
        }
      ],
      "Services": null,
      "Volumes": null
    },
    {
      "Name": "<REDACTED>-jobmaster-worker",
      "Count": 2,
      "Update": {
        "Stagger": 30000000000,
        "MaxParallel": 1,
        "HealthCheck": "checks",
        "MinHealthyTime": 10000000000,
        "HealthyDeadline": 200000000000,
        "ProgressDeadline": 600000000000,
        "AutoRevert": true,
        "AutoPromote": false,
        "Canary": 0
      },
      "Migrate": {
        "MaxParallel": 1,
        "HealthCheck": "checks",
        "MinHealthyTime": 10000000000,
        "HealthyDeadline": 300000000000
      },
      "Constraints": [
        {
          "LTarget": "${node.class}",
          "RTarget": "app",
          "Operand": "="
        }
      ],
      "RestartPolicy": {
        "Attempts": 3,
        "Interval": 300000000000,
        "Delay": 60000000000,
        "Mode": "fail"
      },
      "Tasks": [
        {
          "Name": "<REDACTED>-jobmaster-worker",
          "Driver": "docker",
          "User": "",
          "Config": {
            "force_pull": true,
            "network_mode": "container:<REDACTED>-jobmaster-worker-proxy-${NOMAD_ALLOC_ID}",
            "ulimit": [
              {
                "nofile": "100000:100000"
              }
            ],
            "logging": [
              {
                "config": [
                  {
                    "syslog-address": "udp://${attr.unique.network.ip-address}:514",
                    "syslog-format": "rfc5424micro",
                    "tag": "<REDACTED>_jobmaster-worker_<REDACTED>/http-ping:latest_{{.ID}}"
                  }
                ],
                "driver": "syslog"
              }
            ],
            "args": [
              "-text=hello world"
            ],
            "image": "<REDACTED>/http-ping:latest"
          },
          "Env": {
            "TEST_E": "test",
            "ENV": "sandbox"
          },
          "Services": null,
          "Vault": null,
          "Templates": null,
          "Constraints": null,
          "Affinities": null,
          "Resources": {
            "CPU": 300,
            "MemoryMB": 200,
            "DiskMB": 0,
            "IOPS": 0,
            "Networks": [
              {
                "Mode": "",
                "Device": "",
                "CIDR": "",
                "IP": "",
                "MBits": 10,
                "ReservedPorts": null,
                "DynamicPorts": null
              }
            ],
            "Devices": null
          },
          "DispatchPayload": null,
          "Meta": null,
          "KillTimeout": 1000000000,
          "LogConfig": {
            "MaxFiles": 10,
            "MaxFileSizeMB": 10
          },
          "Artifacts": null,
          "Leader": false,
          "ShutdownDelay": 10000000000,
          "VolumeMounts": null,
          "KillSignal": "",
          "Kind": ""
        },
        {
          "Name": "<REDACTED>-jobmaster-worker-proxy",
          "Driver": "docker",
          "User": "",
          "Config": {
            "image": "<REDACTED>/envoy-sidecar:latest",
            "ulimit": [
              {
                "nofile": "100000:100000"
              }
            ],
            "port_map": [
              {
                "jobmaster-worker-proxy": 81
              }
            ],
            "dns_servers": [
              "${attr.unique.network.ip-address}",
              "${meta.ec2_dns}"
            ],
            "extra_hosts": [],
            "args": [
              "-l",
              "debug"
            ]
          },
          "Env": {
            "NODE_ID": "jobmaster-worker-${NOMAD_ALLOC_ID}-proxy",
            "SERVICE_NAME": "jobmaster-worker",
            "SERVICE_PORT": "8080",
            "LISTENER_PROTOCOL": "http",
            "NAMESPACE": "<REDACTED>",
            "XDS_ADDRESS": "${attr.unique.network.ip-address}"
          },
          "Services": [
            {
              "Name": "jobmaster-worker",
              "PortLabel": "jobmaster-worker-proxy",
              "AddressMode": "auto",
              "Tags": [

              ],
              "CanaryTags": null,
              "Checks": [
                {
                  "Name": "alive",
                  "Type": "tcp",
                  "Command": "",
                  "Args": null,
                  "Path": "",
                  "Protocol": "",
                  "PortLabel": "jobmaster-worker-proxy",
                  "AddressMode": "",
                  "Interval": 20000000000,
                  "Timeout": 3000000000,
                  "InitialStatus": "warning",
                  "TLSSkipVerify": false,
                  "Method": "",
                  "Header": null,
                  "CheckRestart": null,
                  "GRPCService": "",
                  "GRPCUseTLS": false,
                  "TaskName": ""
                }
              ],
              "Connect": null,
              "Meta": null
            }
          ],
          "Vault": null,
          "Templates": null,
          "Constraints": null,
          "Affinities": null,
          "Resources": {
            "CPU": 256,
            "MemoryMB": 128,
            "DiskMB": 0,
            "IOPS": 0,
            "Networks": [
              {
                "Mode": "",
                "Device": "",
                "CIDR": "",
                "IP": "",
                "MBits": 10,
                "ReservedPorts": null,
                "DynamicPorts": [
                  {
                    "Label": "jobmaster-worker-proxy",
                    "Value": 0,
                    "To": 0
                  }
                ]
              }
            ],
            "Devices": null
          },
          "DispatchPayload": null,
          "Meta": null,
          "KillTimeout": 300000000000,
          "LogConfig": {
            "MaxFiles": 10,
            "MaxFileSizeMB": 10
          },
          "Artifacts": null,
          "Leader": true,
          "ShutdownDelay": 10000000000,
          "VolumeMounts": null,
          "KillSignal": "",
          "Kind": ""
        }
      ],
      "EphemeralDisk": {
        "Sticky": false,
        "SizeMB": 300,
        "Migrate": false
      },
      "Meta": {
      },
      "ReschedulePolicy": {
        "Attempts": 0,
        "Interval": 0,
        "Delay": 15000000000,
        "DelayFunction": "exponential",
        "MaxDelay": 600000000000,
        "Unlimited": true
      },
      "Affinities": null,
      "Spreads": null,
      "Networks": null,
      "Services": null,
      "Volumes": null
    }
  ],
  "Update": {
    "Stagger": 30000000000,
    "MaxParallel": 1,
    "HealthCheck": "",
    "MinHealthyTime": 0,
    "HealthyDeadline": 0,
    "ProgressDeadline": 0,
    "AutoRevert": false,
    "AutoPromote": false,
    "Canary": 0
  },
  "Periodic": null,
  "ParameterizedJob": null,
  "Dispatched": false,
  "Payload": null,
  "Meta": null,
  "VaultToken": "",
  "Status": "running",
  "StatusDescription": "",
  "Stable": false,
  "Version": 1,
  "SubmitTime": 1568327261027430000,
  "CreateIndex": 964329,
  "ModifyIndex": 965063,
  "JobModifyIndex": 965063
}
djenriquez commented 5 years ago

Ah, Dani helped me find that the solution is here: https://www.nomadproject.io/guides/integrations/consul-connect/index.html#cni-plugins

Be great to have this in the network stanza doc as well!

Thanks Dani! (Great talk on host volumes as well 😁)

fragoulis commented 3 years ago

It would be great if there was a better explantion here. It seems that this is a common issue with an uncommon answer. Plus the link above has retired.

I have a simple job->group->task with exec driver and I am simply trying to see how things work. Well, all I want is network isolation with bridge mode and the allocation fails with

failed to setup alloc: pre-run hook "network" failed: failed to configure networking for alloc: failed to configure network: failed to find plugin "bridge" in path [/opt/cni/bin]
--

but I do not use nor want to use cni. So, what is the deal? Reading the docs multiple times to see what I am missing, it does not say anywhere that cni plugins are needed to use the native bridge mode.

mikenomitch commented 2 years ago

Please take this with a grain of salt, but I was able to get this issue fixed by running the following bash on any node that was running Consul Connect. (and adding them to my client setup script)

echo "=== Getting CNI Plugins for Consul Connect ==="

curl -L -o cni-plugins.tgz https://github.com/containernetworking/plugins/releases/download/v0.8.6/cni-plugins-linux-amd64-v0.8.6.tgz
sudo mkdir -p /opt/cni/bin
sudo tar -C /opt/cni/bin -xzf cni-plugins.tgz

echo "=== Allowing container traffic thru bridge network to be routed via iptables ==="

echo 1 > /proc/sys/net/bridge/bridge-nf-call-arptables
echo 1 > /proc/sys/net/bridge/bridge-nf-call-ip6tables
echo 1 > /proc/sys/net/bridge/bridge-nf-call-iptables

YMMV :)

tgross commented 2 years ago

@mikenomitch those sysctl settings are documented here https://www.nomadproject.io/docs/integrations/consul-connect#cni-plugins but maybe it could be surfaced better?

github-actions[bot] commented 2 years ago

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.