hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.87k stars 1.95k forks source link

Client status pending on down node. #2576

Open justenwalker opened 7 years ago

justenwalker commented 7 years ago

Nomad version

Nomad v0.5.6

Operating system and Environment details

3-node server cluster:

$ uname -a
Linux nomad-server-01 4.4.0-72-generic #93-Ubuntu SMP Fri Mar 31 14:07:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.1 LTS"

Issue

I've shut down about 20 nomad client nodes. Most of them were cleaned up by the garbage collector, but a couple of them are not going away. Tried to force a GC with /v1/system/gc but it still won't go away.

Client nodes are/were Windows 2012 R2 Datacenter Edition on same nomad version.

Reproduction steps

This is how it happened for me, but may not happen every time:

  1. Create a large-ish cluster (30 nodes?)
  2. Have a bunch of tasks running on the cluster - mostly allocated. (>60%)
  3. Have Hashi-UI running against the cluster (unsure if important)
  4. Start to drain tasks off 1/2 of them
  5. Shut the drained nodes down
  6. Force a GC with /v1/system/gc (Optional)

Nomad Server logs (if appropriate)

nomad.log.zip

justenwalker commented 7 years ago

Work-around

I was able to finally GC these nodes.

There were a few allocations that were pending on those down nodes, and apparently the scheduler was not re-allocating them. Those pending allocations were /periodic-<TS> children of a batch job type.

First, I tried stopping the periodic jobs directly with the /v1/job/<ID> endpoint. this didn't do anything, but it did complete successfully.

Next, I tried forcing an evaluation of the node with /v1/node/<ID>/evaluate endpoint - which seems to have completed the pending allocation and allowed the node to be GC'd.

dadgar commented 7 years ago

@justenwalker Do you have any of the logs from the work around step. The logs you gave in the first post don't seem to have anything showing the GC's you ran.

dadgar commented 7 years ago

Why were the allocations on the node's in pending status? Do you client logs?

dadgar commented 7 years ago

@justenwalker Also I don't think this is a bug. We only GC nodes once all allocations that were placed on the node are also garbage collected. With batch jobs the allocations are GC'd one shot because the scheduler needs to know about previous allocations. So until those jobs stop the node will be kept around even if there is nothing on it.

justenwalker commented 7 years ago

I don't have client logs, those nodes were destroyed along with their logs. The servers still had them marked as pending though - for whatever reason.

I don't think any allocation should be pending on a down node though - that seems like a bug.

dadgar commented 7 years ago

Going to rename the issue.