Purging a parameterized job does not purge or unlink children jobs

DingoEatingFuzz commented 4 years ago

Nomad version

0.10.2

Operating system and Environment details

MacOS, dev agent

Issue

When purging a parameterized job, all children jobs of the parameterized job will still maintain parent references to the now purged job. This creates broken references which complicates any time of job traversal.

Reproduction steps

Run any ol' parameterized job.
Dispatch some instances (children) of the parameterized job
Purge the parameterized job (nomad stop -purge my-parameterized-job)
Observe that the child job is still there in the CLI and API responses.

ID                                     Type                 Priority  Status 
geocoder                               batch/parameterized  50        running  
geocoder/dispatch-1579658083-5adaf751  batch                50        dead

becomes

ID                                     Type                 Priority  Status 
geocoder/dispatch-1579658083-5adaf751  batch                50        dead

with an API response including

  "ID": "geocoder/dispatch-1579658083-5adaf751",
  "ParentID": "geocoder",
  "Name": "geocoder/dispatch-1579658083-5adaf751",

What was expected

One of two things should have happened.

1. The child job should have also been purged

Since the job was already in a terminal state, this would have been the same effect as a GC and it would have kept the job graph tidy.

This gets more complicated when there are running instances of the parameterized job, but hey, purge means purge, right?

2. The child job should have been unreferenced from the parent

As part of the purge, the children of a job can be walked and unlinked from the parent. This is just a change in metadata. Child jobs are still just jobs as far as the scheduler is concerned, but in this way, the job graph isn't left in a broken state.

mwantia commented 2 years ago

Was this forgotten, are there any changes or plans for this bug? It honestly looks kind of embarrassing to suddenly see over 2000 dead jobs and having no way to remove them...

Edit: For anyone who might face the same problem and doesn't want to purge every job by hand, you should be able to purge all of them with this small script:

#!/bin/bash
nomad status | awk '/^'${1}'/' | awk '{ print $1 }' | while read line 
do
   nomad stop -purge ${line}
done

Save it as a file (for example purge-periodic-jobs.sh) make it executable and insert the name of the parent job. Example: ./purge-periodic-jobs.sh name-of-the-batch-job-to-purge

josegonzalez commented 2 years ago

I'm still seeing this in Nomad 1.2.8+ent. We use namespaced deploys to allow for review-app style testing and thus we have a ton of child jobs that we need to purge now.

Allan-Nava commented 6 months ago

Is possible to prevent the purge of the job?

hashicorp / nomad