hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.78k stars 1.94k forks source link

Bug: Windows Driver Health always unhealthy with HCL Client Config #8415

Open idrennanvmware opened 4 years ago

idrennanvmware commented 4 years ago

Output from nomad version Nomad v0.11.3 (8918fc804a0c6758b6e3e9960e4eb2e605e38552)

Operating system and Environment details

Windows Server 2019

Issue

This just caught us in our production environments. We were moving all our configurations over from .json to .hcl

We discovered that on windows ONLY if the client config is HCL then all drivers report unhealthy (linux reports as expected).

The Nomad agent itself reports healthy, everything looks good at a glance, but when you look at driver status none are available thus no allocations are ever placed on the windows node. We realize that, according to the docs here, we are using a soon to be deprecated approach as well.

Reproduction steps

use a HCL client config like follows (BAD RESULT)

client {
    enabled           = true

    # This enables Nomad to run things like bash scripts as a raw exec process
    options {
        driver.raw_exec.enabled = "1"
    }

    # group_names is ansible magic variable that gives all groups current host is part of
    meta {
        consoleonly = "true"
        general_compute_windows = "true"
        windows = "true"
    }
}
Screen Shot 2020-07-09 at 7 29 25 AM

Same file config json - (GOOD RESULT)

{
    "client": {
        "enabled": true,
        "options":{
        "driver.raw_exec.enable":"1"
    },
                "meta": [{ "consoleonly":"true","general_compute_windows":"true","windows":"true" }]
    }

Note: The HCL version tags all the metadata, etc as expected in Nomad - it's purely the drivers that don't show as healthy

Screen Shot 2020-07-09 at 7 28 28 AM
tgross commented 4 years ago

Hi @idrennanvmware! The configuration value you're using in the HCL here is the one that's deprecated from 0.9.0 and beyond. To my knowledge it should still work, but maybe the docs are outdated (ref https://www.nomadproject.io/docs/drivers/raw_exec#client-requirements). Can you try the following instead:

plugin "raw_exec" {
  config {
    enabled = true
  }
}
idrennanvmware commented 4 years ago

Hi @tgross

Yeah we saw that it was deprecated, soon to be removed, but haven't had an opportunity to shift the config and test the new way. What is weird is that the linux flavor works just fine, it's only windows. We will be trying the new approach and we have pipeline tests (now lol) to catch this in the future so it should be quickly visible to us. Will report back when we do that.

Thanks!