kubernetes / kops

Kubernetes Operations (kOps) - Production Grade k8s Installation, Upgrades and Management
https://kops.sigs.k8s.io/
Apache License 2.0
15.98k stars 4.65k forks source link

Deploy kube-proxy as DaemonSet #6527

Open chrisz100 opened 5 years ago

chrisz100 commented 5 years ago

1. Describe IN DETAIL the feature/behavior/change you would like to see. Kubernetes best practices state that kube-proxy is best deployed as a DaemonSet. As kops - as of now - places the kube-proxy manifest into /etc/kubernetes/manifests as a file, this is against this best practice.

2. Feel free to provide a design supporting your feature request. We could deploy kube-proxy using the addon mechanisms

chrisz100 commented 5 years ago

/area networking

stefan-mees commented 5 years ago

We are facing an issue when you have burst loads of pods spawning (like for example with Airflow), you can end up with nodes which are spawned by the autoscaler which don't run a kube-proxy when the max pod limit on the node is already reached.

This happens in combination with aws-vpc-cni

pracucci commented 5 years ago

To my understanding, kube-proxy is defined in a static manifest file and the pod spec has the scheduler.alpha.kubernetes.io/critical-pod annotation. This annotation is used by the kubelet to check if the pod is critical and, if so, the kubelet uses the CriticalPodAdmissionHandler to try to make room for the pod in case all resources are full.

If the HandleAdmissionFailure fails for any reason, kube-proxy pod will be never started on the node, because there will be no retry since pods defined in a manifest file run "outside of a reconcile loop" that will eventually retry it. Is my understanding correct?

I'm asking because we're investigating an issue we've got on a node, where kube-proxy failed to start due to lack of CPU/mem resources, the HandleAdmissionFailure triggered, but it failed to kill one of the pods, thus interrupting forever the kube-proxy pod admission.

pracucci commented 5 years ago

I've also opened this issue on Kubernetes, to gather some more feedback on the topic. I would like to better understand how it works under the hood, in order to able to actively work on a tentative solution: https://github.com/kubernetes/kubernetes/issues/78405

pracucci commented 5 years ago

@justinsb @mikesplain Do you have any thought or experience on this?

luanguimaraesla commented 5 years ago

We are facing the same issue when we have burst loads of pods spawning too. This happens in combination with Cilium that depends on the kube-proxy pod to be initialized. So we lost the entire node with this error

Failed to admit pod kube-proxy-ip-172-16-123-131.ec2.internal_kube-system(031b2b73a3f2f24dc9e9188a18e86330) - Unexpected error while attempting to recover from admission failure: preemption: pod filebeat-nczgs_filebeat(0a23fbea-87c3-11e9-9f55-0e2b991d93be) failed to evict failed to "KillPodSandbox" for "0a23fbea-87c3-11e9-9f55-0e2b991d93be" with KillPodSandboxError: "rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod \"filebeat-nczgs_filebeat\" network: failed to find plugin \"cilium-cni\" in path [/opt/cni/bin/]"

I'm following the PR kubernetes/kubernetes#78493 which has already been approved, but we really need a palliative solution to the problem. Do you see any temporary fix?

pracucci commented 5 years ago

@luanguimaraesla I'm not aware of reliable workarounds to avoid it, but if you find any I would be very interested as well. Looking at the kubelet code, once a static pod fails to get admitted it will be never tried again, so I believe a very hacky workaround could be try to avoid such condition (ie. make the node unschedulable until kube-proxy is ready, then make it schedulable again, but I haven't tried and not sure it may work).

fejta-bot commented 5 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

pierluigilenoci commented 5 years ago

/remove-lifecycle stale

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

pierluigilenoci commented 4 years ago

/remove-lifecycle stale

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

pierluigilenoci commented 4 years ago

/remove-lifecycle stale

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

pierluigilenoci commented 4 years ago

/remove-lifecycle stale

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 4 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

hakman commented 4 years ago

/remove-lifecycle rotten /lifecycle frozen

hakman commented 4 years ago

Some CNI plugins like Cilium and Calico are adding eBPF support. Deploying kube-proxy as a DaemonSet would make it possible disable it and to enable eBPF support at same time, without the need for a rolling-update or downtime.

sl1pm4t commented 1 year ago

Looks like this issue can be closed now. The kube-proxy is deployed as a DS via addons.