Pod Readiness confusion in Troubleshooting Deployments guide

learnk8s / learnk8s.io

https://learnk8s.io

74 stars 31 forks source link

Pod Readiness confusion in Troubleshooting Deployments guide #259

Open nickperry opened 4 years ago

nickperry commented 4 years ago

Throughout https://learnk8s.io/troubleshooting-deployments there seems to be some confusion between the containers of a pod being ready and the pod itself being ready.

It is not possible to determine that a pod is ready from the default kubectl get pods output - only whether it is Running and how many of its containers are ready.

It is (unfortunately) possible for all of the containers in a pod to be Ready but the the Pod itself not to be Ready.

This is an important distinction and alters the fault finding flow.

You can see if a pod is ready in the Conditions section of kubectl describe pods...

Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True

danielepolencic commented 4 years ago

It is (unfortunately) possible for all of the containers in a pod to be Ready but the Pod itself not to be Ready.

Could you offer an example of this?

While I understand the points about multiple containers being (not) ready, I struggle to think of a scenario where all containers are Ready but the Pod isn't.

I think the diagram doesn't do a stellar job of explaining that you could have multiple containers inside a pod and some of them could be broken. In fact, we don't have:

kubectl logs -c to select a specific container or kubectl logs --all-containers
kubectl exec -c
init containers

Those were left out on purpose as the aim of the diagram was to target newcomers. However, we're not against including more branches.

nickperry commented 4 years ago

Sure. Here are a couple of examples of when you would have running pods with ready containers but the pods would not be marked as ready:

https://github.com/kubernetes/kubernetes/issues/80968 https://github.com/kubernetes/kubernetes/issues/84931

We had a serious production outage due to this a couple of weeks ago. Pods all running fine, but no endpoints for services.

I guess it depends if you want your fault finding flow to assume perfect control plane behaviour or not. Most monitoring products make the same assumption unfortunately.

cjroebuck commented 4 years ago

Just ran into this issue too in production!