hammerlab / secotrec

Setup Coclobas/Ketrew Clusters
Apache License 2.0
5 stars 6 forks source link

The very first members of the cluster are suffering the most #71

Open armish opened 7 years ago

armish commented 7 years ago

Especially the kube head and the most senior GCE in the pool:

NAME             CLUSTER-IP    EXTERNAL-IP   PORT(S)   AGE
svc/kubernetes   10.67.240.1   <none>        443/TCP   26d

26 days of up time! That is impressive, but on the other hand as they stick longer their fluent instance is more likely to fail, it seems (anectodal, no definitive data). And there is also this:

(and do keep in mind that I had to keep restarting my rcc{0-5} DCs almost twice a week. Maybe we can set a regular schedule to reset (down && up) our clusters to make them more error prone. As I was reading the kube documentation I saw a relevant optioned being mentioned here: https://cloud.google.com/container-engine/docs/node-auto-repair

And as we discussed earlier, every single new kube feature adaption comes with its own problems, but if those old nodes keep getting stuck (see #70) :man_shrugging: