hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.91k stars 1.95k forks source link

Node reboot manager #15716

Open radriaanse opened 1 year ago

radriaanse commented 1 year ago

Proposal

Something similar to what's provided in Kubernetes by tools like Kured, locksmith and Airlock

To assist with performing a safe rolling restart/redeploy of servers running in a Nomad cluster, commonly after updating the host OS, Nomad could perform this reboot itself but only after it has ensured that this will not break quorum (for server nodes) and not before properly draining a node of it's tasks.

If this could additionally take into account Consul and Vault this would be even better and provides for a very nicely integrated solution, I assume a lot of people run Consul/Vault servers on the same nodes that also run Nomad servers; or if they are properly separated a Nomad client might still be present as well.

Kured can be triggered using either a command or by watching for the existence of the /var/run/reboot-required file, would Nomad also check for that file it'll gain out of the box integration with Flatcar (and possibly others that follow suit) for free. For Fedora CoreOS Zincati is used, it's implementation differs in that it works with the FleetLock protocol over HTTP but the idea is the same.

Use-cases

Automated node reboots, especially on a container OS like Flatcar that provides automatic updates and a post-update trigger mechanism for the reboot.

Attempted Solutions

Use locksmith or Airlock which also requires you to run etcd which doesn't make a lot of sense when already running Consul, combined with some wrapper tool that uses the Nomad API to drain a node beforehand.

tgross commented 1 year ago

Hi @radriaanse! There's two bits to this, as clients and servers have very different needs. On the server side, we don't have automated reboot but we have a lot of tooling in the Enterprise-only features of Autopilot that can help out here. If we were to introduce automated reboots, it'd likely want to be in that context.

For clients, all the options for k8s currently rely on a "lock server" (i.e. a lock in etcd) for coordination. We don't have a first-class notion of locks in Nomad yet, although I think most of the primitives are there with the new Variables feature in 1.4.0. That being said, the reason those options all rely on a lock server is likely because they can't be done first-class in k8s itself (which is sort of designed as a pile of controller loops that folks extend with CRDs).

radriaanse commented 1 year ago

Hey @tgross, thanks for the quick response! I agree that the differences between clients and servers are quite large so it makes sense to treat them differently, looking into the Autopilot docs the relevant feature seems to be "Upgrade Migrations" which indeed looks like a nice fit for extending with some auto reboot functionality (we're not on enterprise but if it ends up that way; so be it).

Not having to lean on an additional k/v lock makes sense for Nomad since it's more integrated yea, also makes the solution agnostic to clusters running with/out Consul.

For clients nodes do you think that simply starting with supporting the file-based watcher use-case would be sufficient? Seems to be the most generic and flexible, maybe the /var/run/reboot-required can be a default but configurable to support other setups, could look something like;

  1. if client.auto_reboot.enabled is true
  2. regularly check for existence of client.auto_reboot.file
  3. once file is present, call node drain and wait for node status to reflect completion
  4. issue a system reboot
  5. node joins the cluster again, toggle it's scheduling eligibility back on

Some design questions that come to mind;

tgross commented 1 year ago

For clients nodes do you think that simply starting with supporting the file-based watcher use-case would be sufficient?

This seems a little similar to something I built out for a previous employer where we were deployed on AWS with spot instances -- we watched for the event that the spot instance was going away and then self-drained. The instance going away was itself controlled by spot prices and autoscaling though. That suggests there's value in separating the mechanism of triggering a self-drain from the policy that decides a given node should be drained. That policy mechanism would cover all the questions about "how many to drain at once?", "what about drain deadlines?", etc. The policy mechanism might be worth spinning out into its own agent to work out the details, similar to how we did autoscaler.

Does it make sense to extend Autopilot to clients?

Addressing this question specifically after a little reflection: almost certainly not. Autopilot relies heavily on serf gossip. The clients don't participate in the gossip swarm and as a result cluster administrators don't need to have network connectivity between clients. This is hugely valuable for some of our flagship use cases like edge computing. So I think trying to shoehorn client updates into Autopilot would be a mistake.

radriaanse commented 1 year ago

Reasoning about Autopilot makes a lot of sense and since you mentioned the autoscaler I looked into how that's setup a bit, having this auto drain/reboot as a separate agent feels a bit much considering it's so small in comparison with the full auto scaling topic.

But if it's preferred to keep out of Nomad core maybe the autoscaler itself is a decent fit for this feature? Draining / rebooting seems related enough that it might just work, even though it's a bit awkward given autoscaler is modeled around metrics;

draft policy:

apm "file" {
  driver = "file"
  config = {
    path = "/var/run/reboot-required"
  }
}

strategy "target-value" {
  driver = "target-value"
}

target "local" {
  driver = "local"
  config = {
    command = "systemctl reboot"
  }
}

scaling "auto_reboot" {
  enabled = true

  policy {
    check "ready_for_reboot" {
      source = "file"
      query  = "exists"

      strategy "target-value" {
        target = 1
      }
    }

    target "local" {}
  }
}

BTW, I'd be willing to contribute once we have some final design that you're happy with; in Nomad or in Autoscaler! (at least for the client node feature given the server story will probably be enterprise-only?)

radriaanse commented 1 year ago

Something I realized just now is that the Autoscaler usually doesn't run on every node, but I think multiple Autoscalers can run just fine in a single cluster? i.e. scheduling the "auto reboot" Autoscaler as a system job and a service job for other Autoscaler loops.