kubernetes-retired / cluster-registry

[EOL] Cluster Registry API
https://kubernetes.github.io/cluster-registry/
Apache License 2.0
238 stars 94 forks source link

Get current clusters failed due to User `apiserver-client` permission #172

Closed patrickshan closed 6 years ago

patrickshan commented 6 years ago

/sig multicluster

What happened: After provisioning cluster registry using crinit (actually it hangs at the last step to wait api server to come up but the actually pods and service are UP), when you do a get cluster operation, it will fail: $ kubectl get clusters Error from server (Forbidden): clusters.clusterregistry.k8s.io is forbidden: User "apiserver-client" cannot list clusters.clusterregistry.k8s.io at the cluster scope

What you expected to happen: It should return "No resources found"

How to reproduce it (as minimally and precisely as possible):

  1. use crinit to provision cluster registry inside an existing kube cluster:
    ./crinit aggregated init lab-cluster-registry --etcd-pv-storage-class ebs-volume --image 'gcr.io/crreleases/clusterregistry:latest_nightly' --host-cluster-context cluster-0.ap-southeast-2.lab
  2. the above command will hang at the last step Waiting for the cluster registry API server to come up... when you check the pods, etcd container is UP while clusterregistry container complains about certificates error
  3. Update secret object lab-cluster-registry-apiserver-credentials to use cluster's existing CA to replace ca.crt
  4. kill the existing pods to pick up the new CA and the pod will be UP and running
    $ kubectl logs -f lab-cluster-registry-apiserver-847849b6b5-msl66 clusterregistry
    I0111 04:10:17.060912       1 serve.go:85] Serving securely on 0.0.0.0:8443
  5. run kubectl get clusters which will get the error above
    $ kubectl get clusters
    Error from server (Forbidden): clusters.clusterregistry.k8s.io is forbidden: User "apiserver-client" cannot list clusters.clusterregistry.k8s.io at the cluster scope

Anything else we need to know?: I also tried to update clusterregistry.k8s.io:apiserver clusterrolebindings to include apiserver-client User but that didn't solve the problem:

$ kubectl get clusterrolebindings clusterregistry.k8s.io:apiserver -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  creationTimestamp: 2018-01-11T04:09:18Z
  labels:
    app: clusterregistry
  name: clusterregistry.k8s.io:apiserver
  resourceVersion: "1034807"
  selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/clusterregistry.k8s.io%3Aapiserver
  uid: 3197e988-f685-11e7-b176-0671bf07fdba
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: clusterregistry.k8s.io:apiserver
subjects:
- kind: ServiceAccount
  name: clusterregistry-k8s-io-apiserver
  namespace: clusterregistry
- apiGroup: rbac.authorization.k8s.io
  kind: User
  name: apiserver-client

Environment:

font commented 6 years ago

@patrickshan I think I see your problem. Since you are updating the CA in the secret object with your own, you will also need to update the apiservice caBundle field with your CA. This CA bundle is in PEM encoding. See the existing apiservice using kubectl get apiservice v1alpha1.clusterregistry.k8s.io -o yaml and you'll see the caBundle field that needs to be updated. It is used by the k8s aggregator to validate the cluster registry API server's serving certificate. This will be easier once #171 is resolved.

font commented 6 years ago

@patrickshan When you say you updated the secret object, are you ONLY updating the ca.crt? If so, I don't think that will work as the generated CA cert and key is used to sign the cluster registry API server certificate server.crt as well as the client certificate (certificate-certificate-data field) used to update the local kubeconfig.

You probably don't care about the local kubeconfig in aggregated mode, but the server cert is important. So additionally, you would probably need to update the secret object with your own server certificate along with the private key and CA used to sign it.

patrickshan commented 6 years ago

@font so I updated the caBundle inside v1alpha1.clusterregistry.k8s.io apiservice to use our own CA and also used the server.key/server.crt certs signed by our own CA in the credentials secret object. But I still get the same result:

$ kubectl get clusters
Error from server (Forbidden): clusters.clusterregistry.k8s.io is forbidden: User "apiserver-client" cannot list clusters.clusterregistry.k8s.io at the cluster scope

This is my understanding about those certs and correct me if I am wrong: these are cert files inside apiserver credentials object: ca.crt, server.crt and server.key. ca.crt is used by APIService pod to validate client certs inside requests, like request from kube-apiserver; server.crt/server.key are used as https server certs for APIService.

on the other hand caBundle inside APIService is used by kube-apiserver to validate APIService server certs.

In my original setup, I only changed ca.crt inside apiserver credentials object to our custom CA. so APIService can use it to verify the client certs inside request from kube-apiserver which are signed by our custom CA. And kube-apiserver can still use crinit generated CA stored inside APIService caBundle to verify those server certs (server.crt/server.key) for APIService.

As you suggested, I updated both caBundle and server.crt/server.key to use our custom CA and certs. This makes kube-apiserver to use our custom CA to verify those server certs signed by our custom CA which still works as before.

I noticed that the User 'apiserver-client' happens to be the CN of certs for kube-apiserver specified inside --proxy-client-cert-file/--proxy-client-key-file parameters. Do I need to use a different CN for this ? I tried to add it to --requestheader-allowed-names of clusterregistry but that doesn't seem to work either.

Also after adding --v=3 for clusterregistry, I can see lots of 403s inside the log:

I0112 01:31:22.143568       1 wrap.go:42] GET /apis/clusterregistry.k8s.io/v1alpha1/clusters?resourceVersion=0: (772.441µs) 403 [[hyperkube/v1.8.6 (linux/amd64) kubernetes/6260bb0/system:serviceaccount:kube-system:generic-garbage-collector] 10.252.12.0:60098]
I0112 01:31:23.151961       1 wrap.go:42] GET /apis/clusterregistry.k8s.io/v1alpha1/clusters?resourceVersion=0: (5.182633ms) 403 [[hyperkube/v1.8.6 (linux/amd64) kubernetes/6260bb0/system:serviceaccount:kube-system:generic-garbage-collector] 10.252.12.0:60098]
font commented 6 years ago

@patrickshan You're right, you may not need to have both CAs match. So I guess just keep it in mind as you work through debugging this.

Some questions:

patrickshan commented 6 years ago

@font

  1. we deploy k8s clusters on AWS using terraform based on our cluster configuration.

  2. currently we only have these three arguments configured for kube-apiserver:

    - --requestheader-client-ca-file=/etc/ssl/kubernetes/ca.pem
    - --proxy-client-cert-file=/etc/ssl/kubernetes/apiserver-client.crt
    - --proxy-client-key-file=/etc/ssl/kubernetes/apiserver-client.key

    these are some other arguments we are using at the moment:

    - --allow-privileged=true
    - --secure-port=443
    - --admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota,PodSecurityPolicy,GenericAdmissionWebhook
    - --anonymous-auth=true
    - --tls-cert-file=/etc/ssl/kubernetes/apiserver.pem
    - --tls-private-key-file=/etc/ssl/kubernetes/apiserver-key.pem
    - --client-ca-file=/etc/ssl/kubernetes/ca.pem
    - --service-account-key-file=/etc/ssl/kubernetes/serviceaccount-key.pem
    - --runtime-config=extensions/v1beta1/networkpolicies=true,rbac.authorization.k8s.io/v1alpha1,batch/v2alpha1=true,extensions/v1beta1/podsecuritypolicy=true,admissionregistration.k8s.io/v1alpha1=true
    - --requestheader-client-ca-file=/etc/ssl/kubernetes/ca.pem
    - --proxy-client-cert-file=/etc/ssl/kubernetes/apiserver-client.crt
    - --proxy-client-key-file=/etc/ssl/kubernetes/apiserver-client.key
    - --cloud-provider=aws
    - --kubelet-preferred-address-types=InternalIP,Hostname,ExternalIP
    - --authorization-mode=RBAC
    - --authorization-rbac-super-user=kube-admin

    Actually I just rebuilt the controllers to add the other four arguments you just mentioned:

     --requestheader-username-headers=X-Remote-User
     --requestheader-group-headers=X-Remote-Group
     --requestheader-extra-headers-prefix=X-Remote-Extra-
     --requestheader-allowed-names=system:auth-proxy

    but I still get the same error as before.

  3. this is the result for that command:

    $ kubectl get --raw /apis/clusterregistry.k8s.io/v1alpha1
    {"kind":"APIResourceList","apiVersion":"v1","groupVersion":"clusterregistry.k8s.io/v1alpha1","resources":[{"name":"clusters","singularName":"","namespaced":false,"kind":"Cluster","verbs":["create","delete","deletecollection","get","list","patch","update","watch"]}]}
font commented 6 years ago

What does kubectl describe apiservice v1alpha1.clusterregistry.k8s.io show?

patrickshan commented 6 years ago

@font this is the result and I only replaced the caBundle string:

$ kubectl describe apiservice v1alpha1.clusterregistry.k8s.io
Name:         v1alpha1.clusterregistry.k8s.io
Namespace:    
Labels:       app=clusterregistry
Annotations:  <none>
API Version:  apiregistration.k8s.io/v1beta1
Kind:         APIService
Metadata:
  Creation Timestamp:  2018-01-12T04:10:11Z
  Resource Version:    7079
  Self Link:           /apis/apiregistration.k8s.io/v1beta1/apiservices/v1alpha1.clusterregistry.k8s.io
  UID:                 7bdf38c5-f74e-11e7-a043-0ab13d5f34ae
Spec:
  Ca Bundle:               <ca.crt-base64-encoded-string-created-through-`cat ca.crt | base64  | tr -d '\n'`>
  Group:                   clusterregistry.k8s.io
  Group Priority Minimum:  10000
  Service:
    Name:            lab-cluster-registry
    Namespace:       clusterregistry
  Version:           v1alpha1
  Version Priority:  20
Status:
  Conditions:
    Last Transition Time:  2018-01-12T04:10:11Z
    Message:               all checks passed
    Reason:                Passed
    Status:                True
    Type:                  Available
Events:                    <none>
patrickshan commented 6 years ago

@font I just found the problem and it's related with those requestheader parameters you mentioned earlier.

But I need to add 'apiserver-client' into --requestheader-allowed-names list (and also some other names like system:nodes, system:kube-scheduler etc since we use different certs for different modules) or just remove that line which allows all the valid certs signed by the CA specified by --requestheader-client-ca-file.

The cluster endpoint is working now and thanks for looking into this issue.

font commented 6 years ago

@patrickshan That's great! I suspected we were getting closer to discovering something in the core k8s apiserver setup.

Now that it's working for you, let us know if you have any recommendations for fixing or improving the problem you were facing. Thanks!

patrickshan commented 6 years ago

@font I just found this page which includes "enable apiserver flags" : https://kubernetes.io/docs/tasks/access-kubernetes-api/configure-aggregation-layer/#enable-apiserver-flags . I think it would be helpful to refer that page inside the document and ask people to double check it before running cluster registry in aggregated mode. And also it's worth to mention that --requestheader-allowed-names is not necessary especially when you have different apps using different CNs talking with kube-apiserver.

perotinus commented 6 years ago

@patrickshan Thank you for working through this with @font! Can we close this issue now? Is there any other follow-up that you think would be useful for us to do? @font filed #175 to add more info about aggregation to our docs.

patrickshan commented 6 years ago

Thanks @perotinus and @font . This issue can be closed.