tkestack / tke

Native Kubernetes container management platform supporting multi-tenant and multi-cluster
Other
1.47k stars 335 forks source link

cluster admin can't get individual cluster due to invalid tenantID #1076

Closed huxiaoliang closed 3 years ago

huxiaoliang commented 3 years ago
  1. get cluster admin token by below steps : https://superuser.com/questions/1394619/how-to-get-the-admin-user-token-from-kubectl
Creating a admin / service account user called k8sadmin
sudo kubectl create serviceaccount k8sadmin -n kube-system

Give the user admin privileges
sudo kubectl create clusterrolebinding k8sadmin --clusterrole=cluster-admin --serviceaccount=kube-system:k8sadmin

Get the token
sudo kubectl -n kube-system describe secret $(sudo kubectl -n kube-system get secret | (grep k8sadmin || echo "$_") | awk '{print $1}') | grep token: | awk '{print $2}'
  1. create kubeconfig file with above admin token
root@VM-0-80-ubuntu:~# cat sa3.kubeconfig.bak 

apiVersion: v1
kind: Config
clusters:
- name: default-cluster
  cluster:
    insecure-skip-tls-verify: true
    server: https://127.0.0.1:6443
contexts:
- name: default-context
  context:
    cluster: default-cluster
    user: default-user
current-context: default-context
users:
- name: default-user
  user:
    token:  <xxxx>
  1. failed to use that admin token to get individual cluster cls-cnkq8l9x
    root@VM-0-80-ubuntu:~# kubectl --kubeconfig ./sa3.kubeconfig.bak get cluster cls-cnkq8l9x
    Error from server (Forbidden): cluster:cls-cnkq8l9x.platform.tkestack.io "cls-cnkq8l9x" is forbidden: User "system:serviceaccount:kube-system:k8sadmin" cannot getCluster resource "cluster:cls-cnkq8l9x" in API group "platform.tkestack.io" at the cluster scope: cluster: cls-cnkq8l9x has invalid tenantID
    root@VM-0-80-ubuntu:~#
  2. only the static token works well
    
    root@VM-0-80-ubuntu:~# kubectl --kubeconfig ./sa2.kubeconfig get cluster cls-cnkq8l9x
    NAME           TYPE        VERSION   STATUS    AGE
    cls-cnkq8l9x   Baremetal   1.18.3    Running   31d
    root@VM-0-80-ubuntu:~# cat sa2.kubeconfig

apiVersion: v1 kind: Config clusters:

root@VM-0-80-ubuntu:~# cat /etc/kubernetes/known_tokens.csv 1m6CJoJ1BQZcQMQOKdlwbPnjS2W,admin,admin,system:masters root@VM-0-80-ubuntu:~#


5. get cluster list works well for both case 

This issue maybe caused by commit https://github.com/tkestack/tke/commit/4a52d812665da3c8ff359d7488a40891e5a33af6, @wangao1236  could you please priority this issue since it block tkestack upgrade  and multi cloud case, thanks in advance. 
huxiaoliang commented 3 years ago

@wangao1236 we have to fix this issue asap, it impacted tkestack release e2e test case as well, thanks please @JiaYongfei comments more for your case

JiaYongfei commented 3 years ago

The same problem exist while creating these resources: machine/cronhpa/ipam/tappcontroller

`machine:.platform.tkestack.io \"\" is forbidden: User \"system:serviceaccount:kube-system:k8sadmin\" cannot createMachine resource \"machine:*\" in API group \"platform.tkestack.io\" at the cluster scope: cluster: global has invalid tenantID

cronhpa:.platform.tkestack.io \"\" is forbidden: User \"system:serviceaccount:kube-system:k8sadmin\" cannot createCronhpa resource \"cronhpa:*\" in API group \"platform.tkestack.io\" at the cluster scope: cluster: global has invalid tenantID

ipam:.platform.tkestack.io \"\" is forbidden: User \"system:serviceaccount:kube-system:k8sadmin\" cannot createIpam resource \"ipam:*\" in API group \"platform.tkestack.io\" at the cluster scope: cluster: global has invalid tenantID

tappcontroller:.platform.tkestack.io \"\" is forbidden: User \"system:serviceaccount:kube-system:k8sadmin\" cannot createTappcontroller resource \"tappcontroller:*\" in API group \"platform.tkestack.io\" at the cluster scope: cluster: global has invalid tenantID`

wangao1236 commented 3 years ago

The same problem exist while creating these resources: machine/cronhpa/ipam/tappcontroller

`machine:.platform.tkestack.io "" is forbidden: User "system:serviceaccount:kube-system:k8sadmin" cannot createMachine resource "machine:*" in API group "platform.tkestack.io" at the cluster scope: cluster: global has invalid tenantID

cronhpa:.platform.tkestack.io "" is forbidden: User "system:serviceaccount:kube-system:k8sadmin" cannot createCronhpa resource "cronhpa:*" in API group "platform.tkestack.io" at the cluster scope: cluster: global has invalid tenantID

ipam:.platform.tkestack.io "" is forbidden: User "system:serviceaccount:kube-system:k8sadmin" cannot createIpam resource "ipam:*" in API group "platform.tkestack.io" at the cluster scope: cluster: global has invalid tenantID

tappcontroller:.platform.tkestack.io "" is forbidden: User "system:serviceaccount:kube-system:k8sadmin" cannot createTappcontroller resource "tappcontroller:*" in API group "platform.tkestack.io" at the cluster scope: cluster: global has invalid tenantID`

Which clusterRole is bound to this serviceAccount ?

JiaYongfei commented 3 years ago

The same problem exist while creating these resources: machine/cronhpa/ipam/tappcontroller machine:_.platform.tkestack.io "_" is forbidden: User "system:serviceaccount:kube-system:k8sadmin" cannot createMachine resource "machine:*" in API group "platform.tkestack.io" at the cluster scope: cluster: global has invalid tenantID cronhpa:_.platform.tkestack.io "_" is forbidden: User "system:serviceaccount:kube-system:k8sadmin" cannot createCronhpa resource "cronhpa:*" in API group "platform.tkestack.io" at the cluster scope: cluster: global has invalid tenantID ipam:_.platform.tkestack.io "_" is forbidden: User "system:serviceaccount:kube-system:k8sadmin" cannot createIpam resource "ipam:*" in API group "platform.tkestack.io" at the cluster scope: cluster: global has invalid tenantID tappcontroller:_.platform.tkestack.io "_" is forbidden: User "system:serviceaccount:kube-system:k8sadmin" cannot createTappcontroller resource "tappcontroller:*" in API group "platform.tkestack.io" at the cluster scope: cluster: global has invalid tenantID

Which clusterRole is bound to this serviceAccount ?

cluster-admin

huxiaoliang commented 3 years ago

@wangao1236 reopend since it cause build failed

Step 1/5 : FROM alpine:3.10
3.10: Pulling from library/alpine
Digest: sha256:f0e9534a598e501320957059cb2a23774b4d4072e37c7b2cf7e95b241f019e35
Status: Downloaded newer image for alpine:3.10
 ---> 536a684cf733
Step 2/5 : RUN echo "hosts: files dns" >> /etc/nsswitch.conf
 ---> Running in f3cf55eb5623
standard_init_linux.go:211: exec user process caused "exec format error"
The command '/bin/sh -c echo "hosts: files dns" >> /etc/nsswitch.conf' returned a non-zero code: 1
make[2]: *** [image.build.linux_arm64.tke-audit-api] Error 1
make[1]: *** [push.multiarch] Error 2
make: *** [release.build] Error 2
build/lib/image.mk:73: recipe for target 'image.build.linux_arm64.tke-audit-api' failed
huxiaoliang commented 3 years ago

service account token only works on kube-system ns, so reopen it @wangao1236

leoryu commented 3 years ago

tke-installer for test rbac issue:

version=revert-rbac && wget https://tke-release-1251707795.cos.ap-guangzhou.myqcloud.com/tke-installer-linux-amd64-$version.run{,.sha256} && sha256sum --check --status tke-installer-linux-amd64-$version.run.sha256 && chmod +x tke-installer-linux-amd64-$version.run && ./tke-installer-linux-amd64-$version.run

revert commits: https://github.com/tkestack/tke/commit/644dec56dc57970d425b6deef94e3b638657ff2d https://github.com/tkestack/tke/commit/d656744b5ffc3b9b40f2f7a05f0fa0054763637f https://github.com/tkestack/tke/commit/ab387453407f802eb12cdafe465714f9a65b20b5

test processes:

  1. kubectl create serviceaccount cls-user -n kube-public

  2. create role: kubectl apply -f clsrole.yaml

    
    cat clsrole.yaml

apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: cls-reader rules:

  1. create clusterRoleBinding: kubectl apply -f clsrolebind.yaml
    
    cat clsrolebind.yaml

apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: cls-user:cls-reader roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cls-reader subjects:

  1. kubectl -n kube-public describe secret $(sudo kubectl -n kube-public get secret | (grep cls-user || echo "$_") | awk '{print $1}') | grep token: | awk '{print $2}'

  2. create kubeconfig file with above admin token:

    
    cat cls-user.kubeconfig

apiVersion: v1 kind: Config clusters:

  1. edit webhook abac policy: kubectl edit -n tke cm tke-auth-api replace

    {"apiVersion":"abac.authorization.kubernetes.io/v1beta1","kind":"Policy","spec":{"user":"system:*","namespace":"*", "resource":"*","apiGroup":"*", "group": "*", "nonResourcePath":"*"}}

    to

    {"apiVersion":"abac.authorization.kubernetes.io/v1beta1","kind":"Policy","spec":{"user":"system:kube-*|system:serviceaccount:kube-system:*","namespace":"*", "resource":"*","apiGroup":"*tkestack.io", "group": "*", "nonResourcePath":"*"}}
    {"apiVersion":"abac.authorization.kubernetes.io/v1beta1","kind":"Policy","spec":{"user":"^system:serviceaccount:tke:default$","namespace":"*", "resource":"*","apiGroup":"*", "group": "*", "nonResourcePath":"*"}}

    and rebuild tke-auth-api pod.

  2. test this kubeconfig kubectl --kubeconfig ./cls-user.kubeconfig get cluster

    Error from server (Forbidden): cluster:*.platform.tkestack.io "*" is forbidden: User "system:serviceaccount:kube-public:cls-user" cannot listClusters resource "cluster:*" in API group "platform.tkestack.io" at the cluster scope: permission for listClusters on cluster:* not verify

    rbac doesn't work.

  3. edit platform configmap: kubectl edit -n tke cm tke-platform-api, remove

    [authorization]
    mode = "Webhook"
    webhook_config_file = "/app/conf/tke-authz-webhook.yaml"

    and rebuild tke-platform-api pod

  4. try again: kubectl --kubeconfig ./cls-user.kubeconfig get cluster

    NAME           CREATED AT
    global         2021-08-18T08:08:43Z

rbac works, and not found any func is blocked yet.

why rbace works: https://kubernetes.io/docs/tasks/extend-kubernetes/configure-aggregation-layer/#authentication-flow

if we set webhook for extension apiserver, authentication will not use above authentication flow.

Tkestack set kube-apiserver authentication with Node, RBAC and Webhook, so in default authentication flow we already have Webhook mode, no need set Webhook in extension apiserver.