weaveworks / launcher

Weave Cloud Launcher
Apache License 2.0
10 stars 13 forks source link

creation of clusterrolebinding on GKE broken #177

Closed rade closed 6 years ago

rade commented 6 years ago

The bootstrap agent invokes the equivalent of the following to obtain the user account:

$ gcloud config get-value --quiet core/account --verbosity=none
Your active configuration is: [weaveworks]
my@email.address

Notice that first line. It pops out on stderr, but we combine stdout and stderr.

This means we end up executing something like

kubectl create clusterrolebinding cluster-admin-matthias --clusterrole cluster-admin --user Your active configuration is: [weaveworks]

I have confirmed this by strategic placement of printfs in the code.

Amazingly this doesn't actually error! So we end up with...

$ kubectl get -o yaml clusterrolebinding cluster-admin-matthias
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  creationTimestamp: 2018-04-15T17:16:01Z
  name: cluster-admin-matthias
  resourceVersion: "7663685"
  selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/cluster-admin-matthias
  uid: <elided>
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: User
  name: 'Your active configuration is: [weaveworks]'

Ouch.

Note that the "Your active configuration..." line does not appear in all circumstances - I've asked a user to check and they didn't see it. So this problem may be confined to specific gcloud versions or config settings. FWIW, my gcloud version is

$ gcloud --version
Google Cloud SDK 197.0.0
alpha 2018.04.06
beta 2018.04.06
bq 2.0.31
core 2018.04.06
gsutil 4.30
rade commented 6 years ago

I've just reproduced this in the GKE web cloud shell :(

$ gcloud config get-value --quiet core/account --verbosity=none
Your active configuration is: [cloudshell-1041]
my@email.address

$ gcloud version
Google Cloud SDK 197.0.0
alpha 2017.09.15
app-engine-go
app-engine-java 1.9.63
app-engine-php " "
app-engine-python 1.9.68
app-engine-python-extras 1.9.63
beta 2017.09.15
bq 2.0.31
cbt
cloud-datastore-emulator 1.4.1
container-builder-local
core 2018.04.06
datalab 20180213
docker-credential-gcr
gcd-emulator v1beta3-1.0.0
gsutil 4.30
kubectl
pubsub-emulator 2018.02.02
rade commented 6 years ago

I created a fresh GKE cluster, using all the default options, and then used the GKE web shell to install the agent, copying the command from the Weave Cloud instructions:

$ curl -Ls https://get.weave.works | sh -s -- --token=<elided> --gke
Downloading the Weave Cloud installer...
Preparing for Weave Cloud setup
Checking kubectl & kubernetes versions
Connecting cluster to "GKE cluster-4" (id: <elided>) on Weave Cloud
Installing Weave Cloud agents on gke_matthias-scratchpad_us-central1-a_cluster-4 at https://<elided>
Performing a check of the Kubernetes installation setup.
There was an error applying the agent: Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
Error from server (Forbidden): clusterroles.rbac.authorization.k8s.io "weave-agent" is forbidden: attempt to grant extra privileges: [PolicyRule{Resources:["*"], APIGroups:["*"]
, Verbs:["*"]} PolicyRule{NonResourceURLs:["*"], Verbs:["*"]}] user=&{me@my.email  [system:authenticated] map[authenticator:[GKE]]} ownerrules=[PolicyRule{Resources:["s
elfsubjectaccessreviews"], APIGroups:["authorization.k8s.io"], Verbs:["create"]} PolicyRule{NonResourceURLs:["/api" "/api/*" "/apis" "/apis/*" "/healthz" "/swagger-2.0.0.pb-v1""
/swagger.json" "/swaggerapi" "/swaggerapi/*" "/version"], Verbs:["get"]}] ruleResolutionErrors=[]
Full output:
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
namespace "weave" configured
serviceaccount "weave-agent" created
clusterrolebinding "weave-agent" created
deployment "weave-agent" created
Error from server (Forbidden): clusterroles.rbac.authorization.k8s.io "weave-agent" is forbidden: attempt to grant extra privileges: [PolicyRule{Resources:["*"], APIGroups:["*"]
, Verbs:["*"]} PolicyRule{NonResourceURLs:["*"], Verbs:["*"]}] user=&{me@my.email  [system:authenticated] map[authenticator:[GKE]]} ownerrules=[PolicyRule{Resources:["s
elfsubjectaccessreviews"], APIGroups:["authorization.k8s.io"], Verbs:["create"]} PolicyRule{NonResourceURLs:["/api" "/api/*" "/apis" "/apis/*" "/healthz" "/swagger-2.0.0.pb-v1""
/swagger.json" "/swaggerapi" "/swaggerapi/*" "/version"], Verbs:["get"]}] ruleResolutionErrors=[]
Rolling back cluster changes

And indeed the clusterrolebinding is set up incorrectly:

$ kubectl get -o yaml clusterrolebinding cluster-admin-matthias
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  creationTimestamp: 2018-04-15T18:33:08Z
  name: cluster-admin-matthias
  resourceVersion: "659"
  selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/cluster-admin-matthias
  uid: <elided>
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: User
  name: 'Your active configuration is: [cloudshell-4322]'
rade commented 6 years ago

To get things to work I ran

kubectl delete clusterrolebinding cluster-admin-$USER
kubectl create clusterrolebinding cluster-admin-$USER --clusterrole cluster-admin --user $(gcloud config get-value --quiet core/account --verbosity=none)

and then re-ran the agent installation, i.e.

curl -Ls https://get.weave.works | sh -s -- --token=<elided> --gke
rade commented 6 years ago

AFAICT, gcloud config get-value has always produced that output. Googling shows examples going back a year.

The aforementioned user has just confirmed that they aren't seeing the "Your active configuration is..." output when running on Mac OS, but are seeing it in the gke web shell.

So there is a reasonable chance this has always been broken for anybody not running the install from a Mac.

crw commented 6 years ago

I had a user hit this issue today. Is there any automatic detection / recovery we can do from this issue?

bboreham commented 6 years ago

@crw you mean the change in #178 didn't work, or you have seen a similar symptom for different reasons? If the former, re-open this issue; if the latter, open a new issue.

crw commented 5 years ago

Going to assume that was some kind of one-off, and leave this issue closed for now.