teamhephy / builder

MIT License
3 stars 12 forks source link

Builder pods not removed after deploy #17

Open Cryptophobia opened 6 years ago

Cryptophobia commented 6 years ago

From @felixbuenemann on February 20, 2017 21:44

Currently (as of deis-builder v2.7.1) the slugbuild and dockerbuild pods are not deleted after a successful or failed build.

This means that the pod (eg. slugbuild-example-e24fafeb-b31237bb) will continue to exist in state "Completed" or state "Error" and the docker container associated with the pod can never be garbage collected by Kubernetes, causing the node to quickly run out of disk space.

Example:

On a k8s node with an uptime of 43 days and 95 GB disk storage for docker there where 249 completed (or some erred) slugbuild and dockerbuild pods whose docker images accounted for 80 GB of disk storage, while the deployed apps and deis services only required 15 GB storage.

Expected Behavior:

The expected behavior for the builder would be, that it automatically deletes the build pod after is has completed or erred, so that the K8s garbage collection can remove the docker containers which frees the disk space allocated to them.

Copied from original issue: deis/builder#487

Cryptophobia commented 6 years ago

From @felixbuenemann on February 20, 2017 21:47

This behavior can easily inspected with:

kubectl get --namespace deis --show-all pods | grep build-

The number of completed pods will increase by one for each build.

Cryptophobia commented 6 years ago

From @bacongobbler on February 21, 2017 1:15

related: https://github.com/deis/builder/issues/57

This seems like in recent versions of k8s, they stopped cleaning up pods in the "success" state. Probably some research needs to be done on how to turn this functionality back on.

Cryptophobia commented 6 years ago

From @felixbuenemann on February 21, 2017 9:21

I'm running K8s 1.4.x if that matters.

Regarding #57 suggestion for Jobs – neither Jobs nor Pods are removed automatically.

From the K8s Job docs:

When a Job completes, no more Pods are created, but the Pods are not deleted either. Since they are terminated, they don’t show up with kubectl get pods, but they will show up with kubectl get pods -a. Keeping them around allows you to still view the logs of completed pods to check for errors, warnings, or other diagnostic output. The job object also remains after it is completed so that you can view its status. It is up to the user to delete old jobs after noting their status. Delete the job with kubectl (e.g. kubectl delete jobs/pi or kubectl delete -f ./job.yaml). When you delete the job using kubectl, all the pods it created are deleted too.

Cryptophobia commented 6 years ago

From @felixbuenemann on February 21, 2017 10:34

Interestingly the docs on Pod Lifecycle say:

In general, Pods do not disappear until someone destroys them. This might be a human or a controller. The only exception to this rule is that Pods with aphase of Succeeded or Failed for more than some duration (determined by the master) will expire and be automatically destroyed.

This seems to be in contrast to what I'm actually seeing…

Cryptophobia commented 6 years ago

From @felixbuenemann on February 21, 2017 11:18

I have opened kubernetes/kubernetes#41787 for clarification of the above statement from the docs.

Cryptophobia commented 6 years ago

From @felixbuenemann on February 27, 2017 22:27

I just got feedback to the kubernetes issue, it looks like by default completed or failed pods are garbage collected if there are more than 12,500 pods. Obviously that is not very helpful in this case, so an automatic cleanup by the builder should be implemented.

Cryptophobia commented 6 years ago

From @felixbuenemann on March 6, 2017 11:8

Quoting here from the kube-controller-manager help on the --terminated-pod-gc-threshold <n> option:

Number of terminated pods that can exist before the terminated pod garbage collector starts deleting terminated pods. If <= 0, the terminated pod garbage collector is disabled. (default 12500)

Cryptophobia commented 6 years ago

From @kwent on March 20, 2017 17:34

Any progress on this ? Sounds like a waste of resources and space for everyone.

Cryptophobia commented 6 years ago

From @pfeodrippe on April 6, 2017 14:46

Same here, it may be linked to a issue I've opened last week.

$ kubectl get --namespace deis --show-all pods | grep build-
slugbuild-teslabit-web-production-d2fcd4c0-7e507178   0/1       Completed   0          1d
Cryptophobia commented 6 years ago

From @pfeodrippe on April 12, 2017 14:16

I'm using this tiny git pre-push hook for deletion https://gist.github.com/pfeodrippe/116c8b570ee2ffcdce8aa15bbae5a22b.

It deletes the last slugbuild created for the app when you git push

Cryptophobia commented 6 years ago

From @davidlmorton on July 25, 2017 2:15

+1 This bit me after a couple of weeks of deploying applications to my deis cluster.