hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.99k stars 1.96k forks source link

Client-Server Versions Mismatch, Not Apparent in Client Failure Messaging #2502

Open robottaway opened 7 years ago

robottaway commented 7 years ago

Nomad version

On client:

-- rottaway@stg-ci-agent52:~ $ nomad -version
Nomad v0.5.0

On server:

-- rottaway@stg-clustermgr-master08:~ $ nomad version
Nomad v0.5.5

Operating system and Environment details

Ubuntu 16.04, EC2 client=m3.large, server=m3.xlarge

Issue

Very misleading message due to client being on v.5 and server being v0.5.5

Can we get a WARNING message when running client that points out version mismatch and possible compatibility issues could occur. At least then the cryptic failure would have some context.

-- rottaway@stg-ci-agent52:~ $ nomad plan chaos_containers.nomad 
Error during plan: Unexpected response code: 500 (rpc error: 2 error(s) occurred:

* Task group membomb validation failed: 1 error(s) occurred:

* 1 error(s) occurred:

* Interval can not be less than 5s (got 0s)
* Task group netbomb validation failed: 1 error(s) occurred:

* 1 error(s) occurred:

* Interval can not be less than 5s (got 0s))
-- rottaway@stg-ci-agent52:~ $ echo $NOMAD_ADDR 
http://http.nomad.service.consul:4646

Reproduction steps

With client @ v0.5 and server @ v0.5.5

nomad plan chaos_containers.nomad

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

Job file (if appropriate)

chaos_containers.nomad:

job "chaos_containers" {
  datacenters = ["us-west-2", "us-west-1"]
  type = "batch"

  periodic {
    cron             = "*/15 * * * * *"
    prohibit_overlap = true
  }

  group "cpubomb" {
    constraint {
      attribute = "${meta.pd_hostname}"
      value = "stg-clustermgr-client16"
    }

    task "cpubomb" {
      driver = "docker"
      config {
        image = "monitoringartist/docker-killer"
        command = "cpubomb"
      }

      resources {
        cpu = 200
        memory = 128
      }
    }
  }

  group "membomb" {
    count = 1

    constraint {
      attribute = "${meta.pd_hostname}"
      value = "stg-clustermgr-client16"
    }

    task "membomb" {
      driver = "docker"
      config {
        image = "monitoringartist/docker-killer"
        command = "membomb"
        args = ["--oom-kill-disable"]
      }

      resources {
        cpu = 20
        memory = 128
      }
    }

    restart {
      mode = "fail"
    }
  }

  group "netbomb" {
    constraint {
      attribute = "${meta.pd_hostname}"
      set_contains = "stg-clustermgr-client16"
    }

    task "netbomb" {
      driver = "docker"
      config {
        image = "monitoringartist/docker-killer"
        command = "membomb"
      }

      env {
        NETBOMB = "iperf -c 45.33.39.39 -p 5201 -i 1 -u -t 120 -P 4"
      }

      resources {
        cpu = 20
        memory = 128
      }
    }

    restart {
      mode = "fail"
    }
  }
}
dadgar commented 7 years ago

@robottaway Thanks for the issue. Just wanted to suggest to not let your servers/clients be out of sync version wise.

robottaway commented 7 years ago

@dadgar for sure, we unfortunately have a few different places where Nomad ends up being installed, with each managing versions separately. We probably can upgrade our configuration management code base to centralize version across all these. That said this was a bit harder to diagnose than I think it should be, file it under user experience maybe?