kubernetes / website

Kubernetes website and documentation repo:
https://kubernetes.io
Creative Commons Attribution 4.0 International
4.46k stars 14.36k forks source link

Improvement for k8s.io/docs/reference/networking/virtual-ips/ #41564

Open dumlutimuralp opened 1 year ago

dumlutimuralp commented 1 year ago

In Kubernetes when a node fails, node controller marks the node as unhealthy after 40 seconds. (by default). Node controller then waits for pod-eviction-timeout (by default 5 minutes) and then updates the API server to set the pods to Terminating state.

This page does not include info on when exactly kube-proxy reacts to remove the pods (on the failed node) from the endpoints list. Is it after marking the node unhealthy or is after pods get into terminating state ?

Could you please clarify for the behaviour asked above in the appropriate chapter of the Kubernetes documentation ?

Thanks in advance.

tamilselvan1102 commented 1 year ago

Ref page : https://kubernetes.io/docs/reference/networking/virtual-ips/

tamilselvan1102 commented 1 year ago

/language en /sig docs

aojea commented 1 year ago

This page does not include info on when exactly kube-proxy reacts to remove the pods (on the failed node) from the endpoints list. Is it after marking the node unhealthy or is after pods get into terminating state ?

kube-proxy only knows about Endpoints/EndpointSlices and its state, so it only forwards traffic to Endpoints that are Ready or, in some special cases, to Endpoints that are terminating

https://kubernetes.io/docs/concepts/services-networking/endpoint-slices/#terminating

a more detailed explanation can be found in

https://kubernetes.io/blog/2022/12/30/advancements-in-kubernetes-traffic-engineering/

sftim commented 1 year ago

Let's add some of that detail to https://k8s.io/docs/reference/networking/virtual-ips/ - it would save readers a bunch of clicking.

/priority backlog /triage accepted

dumlutimuralp commented 1 year ago

In case the readers of this issue wonders what the exact behaviour is as following :
The pods, that are running on the failed node, gets removed from the endpoints list once the node is marked as not readyby the node controller.


@aojea neither of those links cover the node failure scenario. They both explains a rolling update scenario.

kube-proxy only knows about Endpoints/EndpointSlices and its state

About above comment, what I understand from the current Kubernetes documentation is the node controller marks the node as unhealthy not the pods. The difference I am seeing between a node failure scenario and a rolling update scenario is that in a node failure scenario “serving” condition will not be used at all.

So far I looked into the following chapters in Kubernetes documentation.

All of the above pages talk about a controlled sequence of events. Like rolling upgrades, cluster upgrades or scale in/out events.

However an infrastructure failure scenario is also a possible one which I think should be covered in at least in any or all of the above Kubernetes pages. The failure could be as simple as a node having a hardware failure, or a node disconnecting from the rest of the network.


Tests I carried out and the behaviour observed is as below :

  1. Isolate the worker node from the rest of the infrastructure by simulating a network failure. (used AWS security groups here) . Node controller marked the respective node as not ready since it did not receive any response from the node during the node-monitor-grace-period. (which is by default 40 seconds)
  2. I verified that once the node is marked as not ready all the pods running on that node got removed from the endpoints list. (kubectl get endpoints <servicename>)
  3. kubectl get endpointslices <slicename> output was still showing all the pods that are part of the service (including the pods on the isolated node).
  4. Once the pod-eviction-timeout (by default 5 minutes) expired then the node controller set the pods running on the failed node to Terminating.
  5. All those pods got stuck in Terminating and the output of the kubectl get endpointslices ... output always showed all the pods (including the ones in Terminating)
  6. Once I restored the connection of the failed node then the pods that are in Terminating got finalized and deleted. Then the endpointslices output also got updated.
dumlutimuralp commented 1 year ago

Let me know if I can contribute to the docs in any way.

sftim commented 1 year ago

Pull requests are welcome - see https://kubernetes.io/docs/contribute/ for tips.

chrismetz09 commented 1 year ago

Not specifically related to an answer, but it seems a sequence diagram could help explain the different scenarios. But maybe not.

Here is an example from the Diagram Guide using Mermaid.

diagram-guide-example-3

k8s-triage-robot commented 3 months ago

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

k8s-triage-robot commented 3 weeks ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale