hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.86k stars 1.95k forks source link

reason for filter doesn't appear in UI when CNI plugin is missing #22432

Open suikast42 opened 4 months ago

suikast42 commented 4 months ago

I have updagraded from nomad 1.7.7 to 1.8.0 and wanted to test the transparent_proxy with consul.

The docu says the nomad agent needs the consul-cni plugin installed.

I don't have install the consul-cni and deploy the counter das app with the definition :

job "countdash" {

  group "api" {
    network {
      mode = "bridge"
    }

    service {
      name = "count-api"
      port = "9001"

      check {
        type     = "http"
        path     = "/health"
        expose   = true
        interval = "3s"
        timeout  = "1s"

        check_restart {
          limit = 0
        }
      }

      connect {
        sidecar_service {
          proxy {
            transparent_proxy {}
          }
        }
      }
    }

    task "web" {
      driver = "docker"

      config {
        image          = "hashicorpdev/counter-api:v3"
        auth_soft_fail = true
      }
    }
  }

  group "dashboard" {
    network {
      mode = "bridge"

      port "http" {
        static = 9002
        to     = 9002
      }
    }

    service {
      name = "count-dashboard"
      port = "9002"

      check {
        type     = "http"
        path     = "/health"
        expose   = true
        interval = "3s"
        timeout  = "1s"

        check_restart {
          limit = 0
        }
      }

      connect {
        sidecar_service {
          proxy {
            transparent_proxy {}
          }
        }
      }
    }

    task "dashboard" {
      driver = "docker"

      env {
        COUNTING_SERVICE_URL = "http://count-api.virtual.consul"
      }

      config {
        image          = "hashicorpdev/counter-dashboard:v3"
        auth_soft_fail = true
      }
    }
  }
}

I expect to see an error that I don't have installed the needed cni plugin but only see this failure: image

tgross commented 4 months ago

It looks like this might be a more general issue with how filters are displayed in the UI. If you deploy via the CLI you'll see something like the following:

$ nomad job run ./jobs/tproxy.nomad.hcl
==> 2024-05-31T11:38:55-04:00: Monitoring evaluation "cdcff528"
    2024-05-31T11:38:55-04:00: Evaluation triggered by job "countdash"
    2024-05-31T11:38:56-04:00: Evaluation within deployment: "93a7c9c9"
    2024-05-31T11:38:56-04:00: Evaluation status changed: "pending" -> "complete"
==> 2024-05-31T11:38:56-04:00: Evaluation "cdcff528" finished with status "complete" but failed to place all allocations:
    2024-05-31T11:38:56-04:00: Task Group "api" (failed to place 1 allocation):
      * Class "multipass": 1 nodes excluded by filter
      * Constraint "${attr.plugins.cni.version.consul-cni} semver >= 1.4.2": 1 nodes excluded by filter
    2024-05-31T11:38:56-04:00: Task Group "dashboard" (failed to place 1 allocation):
      * Class "multipass": 1 nodes excluded by filter
      * Constraint "${attr.plugins.cni.version.consul-cni} semver >= 1.4.2": 1 nodes excluded by filter

I'll mark this as a UI issue, but I suspect this may be a duplicate and will do a little search for that.

philrenaud commented 4 months ago

Dug into this a little this morning, and tried to reproduce, receiving a slightly different (and more descriptive?) error:

image

This matches up pretty closely with the CLI in terms of error message info (for missing drivers, for unmet constraints)

The UI receives info for these messages from the /evaluations endpoint, which gives us back something that looks like:

...
        "FailedTGAllocs": {
            "test": {
                "AllocationTime": 12958,
                "ClassExhausted": null,
                "ClassFiltered": {
                    "Phil's 2023 MacBook Pro": 1
                },
                "CoalescedFailures": 1,
                "ConstraintFiltered": {
                    "${attr.plugins.cni.version.bridge} semver >= 0.4.0": 1
                },
            }
        },
...

It seems like the computed class ineligible by its nature is a less descriptive placement error state than constraint filtered, which comes with an explanation. It might be possible for us to decorate the ineligibility message with more information, but I am not certain about that.