Cluster state taking a long time to update...

derailed commented 7 years ago

Running ku 1.4.1 on Rpi with kubeadm-arm 1.5 on Hypriot 1.0.1. I've noticed that when I unplug the ethernet cable on one of the minions it takes a really long time for k8s to a) figure out the node is gone and b) pick up the given pods are no longer running and need to be instantiated on another available node.

Does any one else see this or is something not configed right on my cluster?

For example once the ethernet jack is unpluged

kubectl get node -> Takes about ~2mins to pick a node not ready kubectl get pod -> Pod(s) show running status on the disabled node for over 5 mins and then the replication takes place.

Not sure why there is such a big lag in the heartbeat protocol? Wondering is something is not quiet configed right with etcd or flannel on my pi cluster??

luxas commented 7 years ago

This is actually the default configuration for the controller-manager: http://kubernetes.io/docs/admin/kube-controller-manager/

--pod-eviction-timeout duration                                     The grace period for deleting pods on failed nodes. (default 5m0s)
--node-monitor-grace-period duration                                Amount of time which we allow running Node to be unresponsive before marking it unhealthy. Must be N times more than kubelet's nodeStatusUpdateFrequency, where N means number of retries allowed for kubelet to post node status. (default 40s)

This is more of a "feature" of k8s, to not be so aggressive in evicting pods from nodes, but for demos, raspberry pi clusters, etc. it is really helpful to set these to a lower value. You can just edit /etc/kubernetes/manifests/kube-controller-manager.json and add those flags with the values you'd like.

Hope it helps

derailed commented 7 years ago

Right on point, exactly what I was looking for. Thanks for the insight Lucas!

On Thu, Nov 17, 2016 at 7:40 AM, Lucas Käldström notifications@github.com wrote:

This is actually the default configuration for the controller-manager: http://kubernetes.io/docs/admin/kube-controller-manager/

--pod-eviction-timeout duration The grace period for deleting pods on failed nodes. (default 5m0s) --node-monitor-grace-period duration Amount of time which we allow running Node to be unresponsive before marking it unhealthy. Must be N times more than kubelet's nodeStatusUpdateFrequency, where N means number of retries allowed for kubelet to post node status. (default 40s)

This is more of a "feature" of k8s, to not be so aggressive in evicting pods from nodes, but for demos, raspberry pi clusters, etc. it is really helpful to set these to a lower value. You can just edit /etc/kubernetes/manifests/kube-controller-manager.json and add those flags with the values you'd like.

Hope it helps

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/luxas/kubernetes-on-arm/issues/144#issuecomment-261263571, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAP3F5l6EYr0c0Jg-JDLyBT8_ANHR3Sks5q_Gd4gaJpZM4K1XkK .

luxas / kubernetes-on-arm

Cluster state taking a long time to update... #144