Open DingoEatingFuzz opened 4 years ago
Was this forgotten, are there any changes or plans for this bug? It honestly looks kind of embarrassing to suddenly see over 2000 dead jobs and having no way to remove them...
Edit: For anyone who might face the same problem and doesn't want to purge every job by hand, you should be able to purge all of them with this small script:
#!/bin/bash
nomad status | awk '/^'${1}'/' | awk '{ print $1 }' | while read line
do
nomad stop -purge ${line}
done
Save it as a file (for example purge-periodic-jobs.sh
) make it executable and insert the name of the parent job.
Example: ./purge-periodic-jobs.sh name-of-the-batch-job-to-purge
I'm still seeing this in Nomad 1.2.8+ent. We use namespaced deploys to allow for review-app style testing and thus we have a ton of child jobs that we need to purge now.
Is possible to prevent the purge of the job?
Nomad version
0.10.2
Operating system and Environment details
MacOS, dev agent
Issue
When purging a parameterized job, all children jobs of the parameterized job will still maintain parent references to the now purged job. This creates broken references which complicates any time of job traversal.
Reproduction steps
nomad stop -purge my-parameterized-job
)becomes
with an API response including
What was expected
One of two things should have happened.
1. The child job should have also been purged
Since the job was already in a terminal state, this would have been the same effect as a GC and it would have kept the job graph tidy.
This gets more complicated when there are running instances of the parameterized job, but hey, purge means purge, right?
2. The child job should have been unreferenced from the parent
As part of the purge, the children of a job can be walked and unlinked from the parent. This is just a change in metadata. Child jobs are still just jobs as far as the scheduler is concerned, but in this way, the job graph isn't left in a broken state.