hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.82k stars 1.94k forks source link

slow client GC prevents new allocs from running (periodic dispatch) #19917

Open aneustroev opened 7 months ago

aneustroev commented 7 months ago

Nomad version

Nomad v1.7.4

Operating system and Environment details

Ubuntu

Issue

If I run periodic tasks, childrens of periodic don't cleanup by GC until it become gc_max_allocs and then all jobs stuck in pending state. Waiting for childrens cleanup.

Also, I can't set gc_max_allocs very big value, because each alloc make two mounts, and with mounts more that 30k OS work unstable.

Reproduction steps

1 Make a lot frequency periodic tasks (>100) 2 Waiting for allocs > gc_max_allocs 3 See that all new allocs in pending state

Expected Result

1 GC remove old allocs by TTL or 2 GC unmount secrets, private mounts by TTL

Actual Result

1 When gc_max_allocs small, new allocs created with delay or newer be created 2 When gc_max_allocs big, OS become unstable over time

Job file (if appropriate)

job "hello-world" {
  # Specifies the datacenter where this job should be run
  # This can be omitted and it will default to ["*"]
  datacenters = ["*"]

  type = "batch"

  periodic {
    cron             = "*/1 * * * * *"
    prohibit_overlap = true
  }

  # A group defines a series of tasks that should be co-located
  # on the same client (host). All tasks within a group will be
  # placed on the same host.
  group "servers" {

    # Specifies the number of instances of this group that should be running.
    # Use this to scale or parallelize your job.
    # This can be omitted and it will default to 1.
    count = 1

    # Tasks are individual units of work that are run by Nomad.
    task "test" {
      # This particular task starts a simple web server within a Docker container
      driver = "docker"

      config {
        image   = "alpine:latest"
        command = "/bin/sh"
        args = ["-c", "echo", "123423543"]
      }

      # Specify the maximum resources required to run the task
      resources {
        cpu    = 50
        memory = 64
      }
    }
  }
}

Nomad Server logs (if appropriate)

All is good.

Nomad Client logs (if appropriate)

also, I saw strange logs for dead allocs

Feb 08 05:04:44 php-2-eu1.adsrv.wtf nomad[2626928]:     2024-02-08T05:04:44.540-0500 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=06aeb094-0272-1940-fde9-c3a841a50b00 task=php-ai-common-delayed-processing-queue-execute-rabbit-task type=Killing msg="Sent interrupt. Waiting 10s before force killing" failed=false

why it's killing, if it's dead?

lgfa29 commented 7 months ago

Hi @aneustroev 👋

I'm not sure I fully understood the problem. Is the Nomad GC preventing new allocations from being placed until it completes?

Would you be able to share the output of the status for the periodic job and the allocations when they get into this state?

Thank you!

aneustroev commented 7 months ago

Is the Nomad GC preventing new allocations from being placed until it completes?

yes

image

Why Nomad remove old allocations so long?

aneustroev commented 7 months ago

sometimes task hang in queue for hours

aneustroev commented 7 months ago

Untitled

lgfa29 commented 7 months ago

Does the task have any events? And have you noticed if this happens in a particular client or are all allocs pending, regardless of where they are scheduled?

Would be able to share logs from the Nomad client when the problem happens? You can email them to nomad-oss-debug@hashicorp.com with the issue number in the subject.

aneustroev commented 7 months ago

It's don't depend of client. When it happened I see next log messages.

nomad[3402844]:[INFO]  client.gc: garbage collecting allocation: alloc_id=1c459fd2-c0e8-d225-5781-96a33c10672d reason="new allocations and over max (200)"
nomad[3402844]:[INFO]  client.alloc_runner.task_runner: Task event: alloc_id=1c459fd2-c0e8-d225-5781-96a33c10672d task=<taskname> type=Killing msg="Sent interrupt. Waiting 10s before force killing" failed=false
nomad[3402844]:[INFO]  client.gc: marking allocation for GC: alloc_id=1c459fd2-c0e8-d225-5781-96a33c10672d

No one else WARN or ERROR messages.

aneustroev commented 7 months ago

Until all allocs behind gc_max_allocs all works as expected. Maybe GC is too slow? or don't clean allocs as expected.

server GC settings

node_gc_threshold = "24h"
job_gc_interval = "5m"
job_gc_threshold = "10m"
eval_gc_threshold = "10m"
batch_eval_gc_threshold = "10m"

client GC settings

gc_interval = "1m"
gc_max_allocs = 200
gc_parallel_destroys = 10
gc_inode_usage_threshold = 80
gc_disk_usage_threshold = 90
lgfa29 commented 7 months ago

I believe Nomad's client GC is not-blocking, so it shouldn't impact the new allocations.

Would you be able to drop the client log level to TRACE and share the client logs with us when this problem happens again?

aneustroev commented 7 months ago
2024-02-19T03:16:56.261-0500 [TRACE] client.alloc_runner.task_runner: Kill requested: alloc_id=a21ae6da-a252-2f55-a52f-62669c90ca4e task=<taskname>
2024-02-19T03:16:56.261-0500 [TRACE] client.alloc_runner.task_runner: Kill event: alloc_id=a21ae6da-a252-2f55-a52f-62669c90ca4e task=<taskname> event_type=Killing event_reason=""
2024-02-19T03:16:56.261-0500 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=a21ae6da-a252-2f55-a52f-62669c90ca4e task=<taskname> type=Killing msg="Sent interrupt. Waiting 10s before force killing" failed=false
2024-02-19T03:16:56.262-0500 [INFO]  client.gc: marking allocation for GC: alloc_id=a21ae6da-a252-2f55-a52f-62669c90ca4e
2024-02-19T03:16:56.264-0500 [DEBUG] client.gc: alloc garbage collected: alloc_id=a21ae6da-a252-2f55-a52f-62669c90ca4e
2024-02-19T03:16:56.264-0500 [INFO]  client.gc: garbage collecting allocation: alloc_id=9d828c6a-9df0-50ce-dce0-a6fdcc5b6018 reason="new allocations and over max (100)"
2024-02-19T03:16:56.264-0500 [DEBUG] client.alloc_runner.runner_hook.group_services: delay before killing tasks: alloc_id=9d828c6a-9df0-50ce-dce0-a6fdcc5b6018 group=<taskname> shutdown_delay=30s
2024-02-19T03:17:26.266-0500 [TRACE] client.alloc_runner.task_runner: Kill requested: alloc_id=9d828c6a-9df0-50ce-dce0-a6fdcc5b6018 task=<taskname>
2024-02-19T03:17:26.266-0500 [TRACE] client.alloc_runner.task_runner: Kill event: alloc_id=9d828c6a-9df0-50ce-dce0-a6fdcc5b6018 task=<taskname> event_type=Killing event_reason=""
2024-02-19T03:17:26.266-0500 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=9d828c6a-9df0-50ce-dce0-a6fdcc5b6018 task=<taskname> type=Killing msg="Sent interrupt. Waiting 10s before force killing" failed=false
2024-02-19T03:17:26.267-0500 [INFO]  client.gc: marking allocation for GC: alloc_id=9d828c6a-9df0-50ce-dce0-a6fdcc5b6018
2024-02-19T03:17:26.269-0500 [DEBUG] client.gc: alloc garbage collected: alloc_id=9d828c6a-9df0-50ce-dce0-a6fdcc5b6018
2024-02-19T03:17:26.269-0500 [INFO]  client.gc: garbage collecting allocation: alloc_id=27225c79-23f9-8bbd-ec89-6015c77d2bbe reason="new allocations and over max (100)"
2024-02-19T03:17:26.269-0500 [DEBUG] client.alloc_runner.runner_hook.group_services: delay before killing tasks: alloc_id=27225c79-23f9-8bbd-ec89-6015c77d2bbe group=<taskname> shutdown_delay=30s

why is it happend?

client.alloc_runner.runner_hook.group_services: delay before killing tasks: alloc_id=27225c79-23f9-8bbd-ec89-6015c77d2bbe group=<taskname> shutdown_delay=30s
cl-bvl commented 7 months ago

Some details here https://github.com/hashicorp/nomad/issues/7787#issuecomment-1966995078

aneustroev commented 5 months ago

Lower shutdown_delay, is a solution.

tgross commented 3 months ago

Sorry about the delays in returning to this (and the seemingly related https://github.com/hashicorp/nomad/issues/7787). I'm going to mark this as accepted and for roadmapping.