kubernetes / kubectl

Issue tracker and mirror of kubectl code
Apache License 2.0
2.84k stars 917 forks source link

`kubectl get sts` cannot show its status if it is in terminating #1444

Open pacoxu opened 1 year ago

pacoxu commented 1 year ago

The current behavior

[root@demo-dev-master-01 ~]# kubectl get sts
NAME                                                READY   AGE
prometheus-insight-agent-kube-prometh-prometheus    1/1     8d

If the stateful set is terminating with a finalizer, it will block there. But, if an admin debugs using kubectl get, he could not see anything about the terminating status. (The only signal is that its metadata.deletionTimestamp is not nil.

What would you like to be added: An easy-seeing status about sts/deployment is in terminating status.

some proposals:

  1. but this is a significant behavior change, some shells will mistakenly take the status as the Name.
    [root@demo-dev-master-01 ~]# kubectl get sts
    NAME                                                READY   AGE
    prometheus-insight-agent-kube-prometh-prometheus(Terminating)    0/1     8d
  2. add new status in default get-table. (Or show the status only with -o wide by default)
    [root@demo-dev-master-01 ~]# kubectl get sts
    NAME                                                READY   AGE Status
    prometheus-insight-agent-kube-prometh-prometheus    0/1     8d Terminating
  3. add a warning event if a sts/deploy is not deleted after being marked as Terminating for a long period(maybe 1h). Why do I choose 1h here? The default event-ttl in kube-apiserver is 1h. Even when the user describes the statefulset/deployment after 1h, the deleting event will not be there and the admin will still be very confused about why the STS/deploy is not creating a new pod.

The event can just show a message that the object hangs for termination.

Why is this needed: As there are more and more operators, the finalizer is used widely, which has become a big problem for Ops admins.

mpuckett159 commented 1 year ago

When I run get deployments I have a few more columns than are shown in the issue:

❯ kubectl get deployments --all-namespaces
NAMESPACE            NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
kube-system          coredns                  2/2     2            2           130d
local-path-storage   local-path-provisioner   1/1     1            1           130d

Do the READY, UP-TO-DATE, and AVAILABLE fields cover this?

cc @brianpursley

mpuckett159 commented 1 year ago

/triage accepted

brianpursley commented 1 year ago

I looked into this some today, and here is what I found...

The pod's "Status" table value appears to come from here: https://github.com/kubernetes/kubernetes/blob/421ca53be49c4bd64a0c5ce9ceb7c3e17e6e1d11/pkg/printers/internalversion/printers.go#L918-L922

It looks like pod, pv, and pvc are the only ones that indicate that they are terminating when DeletionTimestamp is set.

I like the idea of adding a status column, but we will need to figure out what it should say when it is not terminating. For example, if there is a deployment and 2/3 are Ready, what should it say for the status of the deployment in that case?

I think we also need to make sure adding a new column is not considered a breaking change. I'm pretty sure we've said in the past that the table format is not considered an API and could be changed, so this should not be a problem.

Finally, what about describers? Should we update the output of kubectl describe to indicating that the resource is terminating when it has a DeletionTimestamp? It seems like it would also be helpful to see that there.

carlory commented 1 year ago

/assign

sftim commented 1 year ago

Finally, what about describers? Should we update the output of kubectl describe to indicating that the resource is terminating when it has a DeletionTimestamp? It seems like it would also be helpful to see that there.

:+1:

k8s-triage-robot commented 2 months ago

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted