hjacobs / kube-janitor

Clean up (delete) Kubernetes resources after a configured TTL (time to live)
GNU General Public License v3.0
473 stars 40 forks source link

Pods are orphaned when Deployment deleted by kube-janitor #28

Closed gree-gorey closed 5 years ago

gree-gorey commented 5 years ago

What happened:

I created Deployment and annotated it:

kubectl run temp-nginx --image=nginx
kubectl annotate deploy temp-nginx janitor/ttl=1m

The pod had valid ownerReferences:

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2019-03-27T14:02:19Z"
  generateName: temp-nginx-68498674c5-
  labels:
    pod-template-hash: 68498674c5
    run: temp-nginx
  name: temp-nginx-68498674c5-2mxfc
  namespace: dev
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: temp-nginx-68498674c5
    uid: ef5dbd98-5098-11e9-999b-02405219215c

As expected, the Deployments was deleted after 1m, here the logs of kube-janitor:

2019-03-27 14:03:37,720 DEBUG: Deployment temp-nginx with 1m TTL is 1m18s old
2019-03-27 14:03:37,720 INFO: Deployment temp-nginx with 1m TTL is 1m18s old and will be deleted (annotation janitor/ttl is set)
2019-03-27 14:03:37,731 DEBUG: https://10.96.0.1:443 "POST /api/v1/namespaces/dev/events HTTP/1.1" 201 836
2019-03-27 14:03:37,732 INFO: Deleting Deployment dev/temp-nginx..
2019-03-27 14:03:37,738 DEBUG: https://10.96.0.1:443 "DELETE /apis/extensions/v1beta1/namespaces/dev/deployments/temp-nginx HTTP/1.1" 200 1712
2019-03-27 14:03:37,739 DEBUG: ReplicaSet temp-nginx-68498674c5 with 1m TTL is 1m18s old
2019-03-27 14:03:37,739 INFO: ReplicaSet temp-nginx-68498674c5 with 1m TTL is 1m18s old and will be deleted (annotation janitor/ttl is set)
2019-03-27 14:03:37,745 DEBUG: https://10.96.0.1:443 "POST /api/v1/namespaces/dev/events HTTP/1.1" 201 858
2019-03-27 14:03:37,745 INFO: Deleting ReplicaSet dev/temp-nginx-68498674c5..
2019-03-27 14:03:37,752 DEBUG: https://10.96.0.1:443 "DELETE /apis/extensions/v1beta1/namespaces/dev/replicasets/temp-nginx-68498674c5 HTTP/1.1" 200 1546
2019-03-27 14:03:37,754 INFO: Clean up run completed: resources-processed=472, deployments-with-ttl=1, deployments-deleted=1, replicasets-with-ttl=1, replicasets-deleted=1

But pod wasn't deleted and was orphaned. ownerReferences was removed:

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2019-03-27T14:02:19Z"
  generateName: temp-nginx-68498674c5-
  labels:
    pod-template-hash: 68498674c5
    run: temp-nginx
  name: temp-nginx-68498674c5-2mxfc
  namespace: dev
  resourceVersion: "66358563"
  selfLink: /api/v1/namespaces/dev/pods/temp-nginx-68498674c5-2mxfc
  uid: ef60a8b0-5098-11e9-999b-02405219215c

What I expected:

That deletion of resources is cascading. Otherwise there is little sense to just delete Deployments and leave all the Pods behind.

Context

kubectl version:

Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.4", GitCommit:"c27b913fddd1a6c480c229191a087698aa92f0b1", GitTreeState:"clean", BuildDate:"2019-02-28T13:37:52Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.4", GitCommit:"c27b913fddd1a6c480c229191a087698aa92f0b1", GitTreeState:"clean", BuildDate:"2019-02-28T13:30:26Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

kube-janitor version and flags:

Janitor v0.6 started with debug=True, delete_notification=None, dry_run=False, exclude_namespaces=kube-system, exclude_resources=events,controllerrevisions, include_namespaces=all, include_resources=all, interval=30, once=False, rules_file=/config/rules.yaml

rules.yaml:

# example rules configuration to set TTL for arbitrary objects
# see https://github.com/hjacobs/kube-janitor for details
rules:
  - id: require-application-label
    # remove deployments and statefulsets without a label "application"
    resources:
      # resources are prefixed with "XXX" to make sure they are not active by accident
      # modify the rule as needed and remove the "XXX" prefix to activate
      - XXXdeployments
      - XXXstatefulsets
    # see http://jmespath.org/specification.html
    jmespath: "!(spec.template.metadata.labels.application)"
    ttl: 4d
  - id: temporary-pr-namespaces
    # delete all namespaces with a name starting with "pr-*"
    resources:
      # resources are prefixed with "XXX" to make sure they are not active by accident
      # modify the rule as needed and remove the "XXX" prefix to activate
      - XXXnamespaces
    # this uses JMESPath's built-in "starts_with" function
    # see http://jmespath.org/specification.html#starts-with
    jmespath: "starts_with(metadata.name, 'pr-')"
    ttl: 4h
hjacobs commented 5 years ago

Thanks for reporting, I tested that cascading should be the default. Will investigate.

gree-gorey commented 5 years ago

Here is the open issues in pykube related to this: https://github.com/kelproject/pykube/issues/87 https://github.com/kelproject/pykube/issues/105

I assumed from Pipfile you use this library and not your fork of it

hjacobs commented 5 years ago

@gree-gorey it uses the fork (pykube-ng), see https://github.com/hjacobs/kube-janitor/blob/master/Pipfile#L8 --- but this does not change the situation :smile:

gree-gorey commented 5 years ago

@hjacobs right, I missed it Btw, I can look into it and see if I can provide a PR if needed