Closed dlespiau closed 6 years ago
Ah! Aaron pointed out that we don't throw a hard error when not being able to create an cluster-admin
binding:
if opts.GKE {
err := createGKEClusterRoleBinding(kubectlClient)
if err != nil {
fmt.Fprintln(os.Stderr, "WARNING: For GKE installations, a cluster-admin clusterrolebinding is required.")
fmt.Fprintf(os.Stderr, "Could not create clusterrolebinding: %s", err)
}
}
So, we then process with the bootstrapping which errors out. The real reason may have been that we don't find gcloud
, don't create the admin role and so aren't able to create the weave-agent
service account.
This is what the full curl
invocation looks like:
$ curl -Ls https://get.weave.works | sh -s -- --token=<redacted> --gke
Downloading the Weave Cloud installer...
Checking kubectl & kubernetes versions
Installing Weave Cloud agents on gke_sock-shop-staging_europe-west2-b_sock-shop at <redacted>
WARNING: For GKE installations, a cluster-admin clusterrolebinding is required.
Could not create clusterrolebinding: Could not find gcloud in PATH, please install it: https://cloud.google.com/sdk/docs/There was an error applying the agent: namespace "weave" configured
serviceaccount "weave-agent" configured
clusterrolebinding "weave-agent" configured
deployment "weave-agent" created
Error from server (Forbidden): clusterroles.rbac.authorization.k8s.io "weave-agent" is forbidden: attempt to grant extra privileges: [PolicyRule{Resources:["*"], APIGroups:["*"], Verbs:["*"]} PolicyRule{NonResourceURLs:["*"], Verbs:["*"]}] user=&{damien@weave.works [system:authenticated] map[]} ownerrules=[PolicyRule{Resources:["selfsubjectaccessreviews"], APIGroups:["authorization.k8s.io"], Verbs:["create"]} PolicyRule{Resources:["selfsubjectrulesreviews"], APIGroups:["authorization.k8s.io"], Verbs:["create"]} PolicyRule{NonResourceURLs:["/api" "/api/*" "/apis" "/apis/*" "/healthz" "/swaggerapi" "/swaggerapi/*" "/version"], Verbs:["get"]} PolicyRule{NonResourceURLs:["/swagger-2.0.0.pb-v1"], Verbs:["get"]} PolicyRule{NonResourceURLs:["/swagger.json"], Verbs:["get"]}] ruleResolutionErrors=[]
We don't know whether the lack of gcloud is the issue. Can we improve our error reporting to include this warning?
That's an acceptable step!
has the improved error reporting thrown up any clues yet?
Actually, only one bootstrap error with the added gke error in the last 5 days (2 days ago):
Error from server (AlreadyExists): clusterrolebindings.rbac.authorization.k8s.io "cluster-admin-<REDACTED>" already exists
I thought I fixed that in #140. Why are we still seeing this?
As agreed offline we should check to see the user has enough permissions to grant them cluster wide admin roles on GKE, if not we should error out https://github.com/weaveworks/launcher/blob/master/bootstrap/main.go#L109
Possibly check by running something like this:
gcloud container clusters get-credentials
If the problem was anything else but this, we should still continue...
The PR for suggesting to the user that something went wrong, was merged https://github.com/weaveworks/launcher/pull/186 Not sure if we can close this now, or is there anything else we can do here? Maybe give a better suggestion how the user can fix it?
This error happens with GKE, when users try to run the command and, for some reason, the authenticated user doesn't have a
cluster-admin
role bound.Users are supposed to call the command with
--gke
so such a clusterrolebinding is created. If unable to do so, we error out. Current hypothesis, users either don't select the kubernetes/GKE environment or don't copy the--gke
option.I could reproduce this on sock-shop, deleting my admin binding: