weaveworks / launcher

Weave Cloud Launcher
Apache License 2.0
10 stars 13 forks source link

Weave agents failing to connect for minikube started Kubernetes RBAC cluster #158

Closed lilic closed 6 years ago

lilic commented 6 years ago

This issue happen when a user uses the platform Kubernetes and environment minikube on weave cloud and their cluster is started with minikube that has RBAC enabled.

If the correct roles are not setup the DNS pods error out because of the permission problems. This is expected behaviour from minikube, and from what I understand will only be fixed for kubeadm starting in a few releases. On weave cloud setting up gets stuck on: Waiting for Weave Cloud agents to connect. When looking at the pod logs the following error actually occurs:

time="2018-04-03T14:48:53Z" level=error msg="Failed to execute kubectl apply: Unable to connect to the server: dial tcp: i/o timeout\nFull output:\nUnable to connect to the server: dial tcp: i/o timeout"

The fix for this is to apply this RBAC manifest file, which gives the correct permissions and the DNS pods are up and running. Because the weave-agent pod does not error out, but simply log the error, the weave-agent pod needs to be deleted/restarted as well for weave cloud process to succeed.

leth commented 6 years ago

How do you think we should solve this issue? Detect broken DNS from the bootstrap program and error out, or ask the user whether they'd like us to fix it for them?

lilic commented 6 years ago

@leth Yes, I would definitely do that in the bootstrap part. Not sure if we should fix it behind the users back. Think for now we can just check if the DNS pods are up and running, if they error out then I would error out for now as well in the bootstrap part and leave a suggestion that maybe not the correct DNS pods rules are configured and what manifest would need to be applied. But I would leave it up to user to apply the RBAC manifest file, WDYT?

rade commented 6 years ago

This "broken out of the box" behaviour of minikube --extra-config=apiserver.Authorization.Mode=RBAC surely is a bug. Is it recorded somewhere?

lilic commented 6 years ago

@rade More details in the following issues. https://github.com/kubernetes/minikube/issues/1734#issue-245035445 and https://github.com/kubernetes/minikube/issues/1722

rade commented 6 years ago

There's also kubernetes/minikube#2510. Looks like there's a fix that involves fiddling with roles rather than having to re-do the DNS config. This would match what we do on GKE.

rade commented 6 years ago

Looks like there's a fix that involves fiddling with roles

i.e.

kubectl create clusterrolebinding add-on-cluster-admin --clusterrole=cluster-admin --serviceaccount=kube-system:default
lilic commented 6 years ago

@rade Yes, that's more or less what the above RBAC manifest that I pasted does as well. https://raw.githubusercontent.com/coreos/prometheus-operator/master/scripts/minikube-rbac.yaml

rade commented 6 years ago

Right. I'd be reluctant to apply something as complicated as that to a user's cluster. But a one-liner would be fine. Indeed, as I mentioned, that's what we do on GKE. HOWEVER, the situation here is different since a) here the user made a conscious decision to enable RBAC, unlike on GKE where it's on by default, and b) here DNS is broken which has nothing to do with the Weave Cloud agent, whereas on GKE only the Weave Cloud agent is broken.

I am a bit puzzled why users are falling into this trap. Are there instructions out there that tell users to run minikube in this way but fail to mention the extra steps required to make DNS work?

lilic commented 6 years ago

Exactly, that's why I am reluctant to just "fix" this for the user behind their back. I think giving them an option/nudge might be better. WDYT?

I am a bit puzzled why users are falling into this trap. Are there instructions out there that tell users to run minikube in this way but fail to mention the extra steps required to make DNS work?

In the docs it just tells you how to start with RBAC, but nothing much else if you quickly glance. https://kubernetes.io/docs/getting-started-guides/minikube/ And TBH I have seen other devs fall into this same trap, where maybe at first they did not have a need for DNS working correctly but afterwards they spent some time trying to figure out why things stopped working.

leth commented 6 years ago

Exactly, that's why I am reluctant to just "fix" this for the user behind their back. I think giving them an option/nudge might be better. WDYT?

That sounds good; we can make the nudge message helpful :)

I am a bit puzzled why users are falling into this trap. Are there instructions out there that tell users to run minikube in this way but fail to mention the extra steps required to make DNS work?

Did someone mention this happens with kubeadm too?