hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.94k stars 1.96k forks source link

Nomad upgrade documentation could be clearer about service-stanza #9292

Open kaspergrubbe opened 4 years ago

kaspergrubbe commented 4 years ago

Nomad version

0.12.7 / 1.0.0.beta

Reproduction steps

Go to https://www.nomadproject.io/docs/upgrade/upgrade-specific and read about »mbits and Task Network Resource deprecation:

Starting in Nomad 0.12.0 the mbits field of the network resource block has been deprecated. This is in part because we felt that mbits didn't accurately account network bandwidth as a resource.

Additionally the use of the network block inside of a task's resource block is also deprecated. Users are advised to move their network block to the group block. Recent networking features have only been added to group based network configuration. If any usecase or feature which was available with task network resource is not fulfilled with group network configuration, please open an issue detailing the missing capability.

Okay, that means that this old job file:

job "ingress" {
  datacenters = ["eu-test-1"]
  type = "system"

  update {
    max_parallel = 1
    min_healthy_time = "10s"
    healthy_deadline = "3m"
    progress_deadline = "10m"
    auto_revert = false
    canary = 1
  }

  group "loadbalancer" {
    count = 1

    restart {
      interval = "30m"
      delay = "15s"
      mode = "fail"
    }

    ephemeral_disk {
      size = 300
    }

    task "haproxy" {
      driver = "docker"

      kill_timeout = "12s"
      kill_signal = "SIGUSR1" # https://linux.die.net/man/1/haproxy

      template {
        data = <<EOF
global
  maxconn 1024
  daemon

  log 127.0.0.1 local0 debug

resolvers ourdns
  nameserver dns1 8.8.8.8:53
  nameserver dns2 8.8.4.4:53
  nameserver dns3 1.1.1.1:53
  nameserver dns4 1.0.0.1:53

resolvers consuldns
  nameserver consul 127.0.0.1:8600
  accepted_payload_size 8192
  hold valid 5s

listen fe_ingress_stats
  bind *:8887
  option httplog
  mode http
  stats enable
  stats show-legends
  stats show-node
  stats uri /

  acl going_down stopping
  monitor-uri /healthy
  monitor fail if going_down

backend be_ingress_stats
  mode http
  server Local localhost:8887

frontend fe_http
  bind *:8888
  option httplog
  mode http

  use_backend be_ingress_stats
EOF
        destination = "local/haproxy.cfg"
        change_mode   = "signal"
        change_signal = "SIGHUP"
      }

      config {
        image = "haproxy:2.3.0"
        network_mode = "host"

        volumes = [
          "local/haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg",
        ]
      }

      resources {
        cpu    = 300
        memory = 100

        network {
          port "admin" {
            static = 8887
          }
          port "http" {
            static = 8888
          }
          port "https" {
            static = 8889
          }
        }
      }

      service {
        name = "haproxy-ui"
        tags = []
        port = "admin"

        check {
          name     = "Ingress HTTP healthcheck"
          path     = "/"
          type     = "http"
          protocol = "http"
          interval = "10s"
          timeout  = "2s"
        }
      }
    }
  }
}

Needs to be updated like this by moving the network-stanza into the group stanza, easy!

--- ingress-nomad-1.nomad.hcl   2020-11-07 23:27:39.000000000 +0000
+++ ingress-nomad-2.nomad.hcl   2020-11-07 23:45:26.000000000 +0000
@@ -14,6 +14,18 @@
   group "loadbalancer" {
     count = 1

+    network {
+      port "admin" {
+        static = 8887
+      }
+      port "http" {
+        static = 8888
+      }
+      port "https" {
+        static = 8889
+      }
+    }
+
     restart {
       interval = "30m"
       delay = "15s"
@@ -90,18 +102,6 @@
       resources {
         cpu    = 300
         memory = 100
-
-        network {
-          port "admin" {
-            static = 8887
-          }
-          port "http" {
-            static = 8888
-          }
-          port "https" {
-            static = 8889
-          }
-        }
       }

       service {

That blows up like this:

Error submitting job: Unexpected response code: 500 (rpc error: 1 error occurred:
    * Task group loadbalancer validation failed: 1 error occurred:
    * Task haproxy validation failed: 1 error occurred:
    * 1 error occurred:
    * port label "admin" referenced by services haproxy-ui does not exist
)

It took me a while to understand that the service stanza needed to be movws to the group stanza (it was in the task-stanza before):

--- ingress-nomad-2.nomad.hcl   2020-11-07 23:45:26.000000000 +0000
+++ ingress-nomad-3.nomad.hcl   2020-11-07 23:47:01.000000000 +0000
@@ -26,6 +26,21 @@
       }
     }

+    service {
+      name = "haproxy-ui"
+      tags = []
+      port = "admin"
+
+      check {
+        name     = "Ingress HTTP healthcheck"
+        path     = "/"
+        type     = "http"
+        protocol = "http"
+        interval = "10s"
+        timeout  = "2s"
+      }
+    }
+
     restart {
       interval = "30m"
       delay = "15s"
@@ -103,21 +118,6 @@
         cpu    = 300
         memory = 100
       }
-
-      service {
-        name = "haproxy-ui"
-        tags = []
-        port = "admin"
-
-        check {
-          name     = "Ingress HTTP healthcheck"
-          path     = "/"
-          type     = "http"
-          protocol = "http"
-          interval = "10s"
-          timeout  = "2s"
-        }
-      }
     }
   }
 }

Did I miss something obvious or are the upgrade path not very clear? I might not be the only one that stumbled upon this.

kaspergrubbe commented 4 years ago

If the diffs are confusing to read, here's a zip with the three configuration files: ingress-nomad-files.zip

schmichael commented 4 years ago

Worth noting that in addition to the confusing error message and unhelpful docs, we should also fix the http response code to 400 instead of 500. Thanks for the report @kaspergrubbe!

dnrce commented 3 years ago

Big +1 on this, and please note that it is not only the upgrade docs, but https://www.nomadproject.io/docs/job-specification/service as a whole that still promotes putting service inside a task.

Should job -> group -> task -> service be fully deprecated?

kaspergrubbe commented 3 years ago

I thought this deprecation was fixed or decided before the 1.0.0 release, but it seems like it is still here.