Closed mikmatko closed 2 weeks ago
This can help to identify completed jobs that have not been cleaned up:
kubectl get jobs --all-namespaces -o custom-columns='NAMESPACE:.metadata.namespace,NAME:.metadata.name,OWNER:.metadata.ownerReferences[].name,STATUS:.status.succeeded'
Fleet is not deleting the jobs related to GitRepos
.
We create a new job for every new commit we get in the git repository, which is a problem in systems with many GitRepos
and many commits because we could reach the etcd
limits.
Test a few scenarios so cover all the possible cases
GitRepo
that is successful, check that the job is created and deleted when the job succeeds GitRepo
that is successful, check that the job is created and deleted when the job succeeds. Then update the Commit, check that another job is created and deleted after it succeeds.GitRepo
that is successful, check that the job is created and deleted when the job succeeds. Then Force Update, check that another job is created and deleted after it succeeds.GitRepo
that is successful, check that the job is created and deleted when the job succeeds. Then change the Spec of the GitRepo
(for example change the path), check that another job is created and deleted after it succeeds.GitRepo
that is not successful (for example a bad path or git url or anything that makes the job fail). Check that the job is not deleted and we can see the error in the logs.GitRepo
that creates a job that is slow, so we have time to Force Update before it is finished. Check that the job is deleted and re-created GitRepo
that creates a job that is slow, so we have enough time to change the Spec (for example the path). Check that the job is deleted and re-created.In any test, the job should only stay if it is not successful, otherwise it should be deleted.
Checked in v2.10.0-alpha5
with fleet:105.0.0+up0.11.0-beta.3
Automated tests in place success here checking:
Aside from this, checked that [fleet-cleanup-gitrepo-jobs]
is set to @daily
and can be run at any time
Is there an existing issue for this?
Current Behavior
In Rancher local cluster, for each commit/change in each
GitRepo
, there is aJob
started by Fleet. There is nothing to clean up these Jobs, so you will quickly end up with hundreds of lingeringJob
objects and their completed Pods.I didn't notice this behavior in Fleet 0.9.x, so I assume something in 0.10.x introduced these Jobs. I was assuming this is related to automatic chart dependency update, but setting
disableDependencyUpdate
totrue
doesn't seem to affect.Expected Behavior
Unnecessary Job objects are cleaned up, e.g. by setting some sane default for
.spec.ttlSecondsAfterFinished
: https://kubernetes.io/docs/concepts/workloads/controllers/ttlafterfinished/Steps To Reproduce
Job
objectsEnvironment
Logs
No response
Anything else?
No response