Open Baughn opened 6 years ago
Where and how is it scheduled, I don't see it in the code?
Indeed, kube-dns
is a tricky part, I had some troubles in production with already (mainly latency...).
For instance, openSHift does special tricks, they configure kube-dns as deamon-set and configure the pods to use local dns. we could make sure it has the right priority, and is replicated.
But I'd say, let's not over engineer it, let's have sane default for beginners, but specialist will always have different ways.
Then, from my beginner's perspective:
While doing something totally unrelated, and presumably due to rolling machine reboots, I had every instance of kube-dns de-schedule. Subsequently the cluster seemed unrecoverable, or at least would have been difficult enough to recover that I decided I might as well delete and rebuild it, since I didn't have anything important there yet.
I really think these particular services should be hard to kill.
As recommended here:
https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/dns/kube-dns.yaml.sed
We could scale this to 2 replicas, and add autoscaling, this would be a same default I think.
Here's a puzzler: What happens if, for whatever reason, kube-dns fails to schedule due to overall lack of CPU?
If your answer is "everything breaks", you'd be right. :)
There's a solution hinted at in https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/, namely enabling priorities and marking all the kube-system pods as critical, especially the ones required for the cluster to keep working. I don't think there's a way to mark an entire namespace as high-priority, but certainly these particular pods should be at maximum priority.