Open jaskaransarkaria opened 5 months ago
This job deletes all successful pods: https://github.com/ministryofjustice/cloud-platform-environments/blob/main/bin/delete_completed_jobs.rb. SO can we add the failed jobs as well?
Probably should convert this ☝🏽 script to bash or go too
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been stalled for 7 days with no activity.
can we create a gatekeeper rule that requires the following parameters to be in place for cronjobs to clear jobs
spec:
schedule: "*/1 * * * *"
successfulJobsHistoryLimit: 0
failedJobsHistoryLimit: 0
From duplicate ticket
these pods hog ips and prevent a node from being drained, clean them up to help keep the cluster in a good state.
https://mojdt.slack.com/archives/C514ETYJX/p1724835696695569
Create a new maintenance job that runs nightly and cleans up each of our clusters, the code below might be useful, you can swap parallel for xargs if preferred:
kubectl get pods --field-selector="status.phase=Failed,spec" -A --no-headers | awk '{print $2 " -n " $1}' | parallel -j1 --will-cite kubectl delete pod "{= uq =}"
we should consider:
treating errored and completed jobs differently (we need to make sure we aren't blasting genuine errored jobs so users have time to fix the errors)
treating prod and non-prod differently
Background
Errored jobs seem to stick around and slows down nodes being drained. Write a small bash script to delete these jobs in errored states across the cluster (nightly or weekly) eg.
Definition of done
Reference
How to write good user stories