Dex console not launching when tanzu cluster kubeconfig get called with Pinniped configured

cormachogan commented 3 years ago

Bug Report

Deployed TKG mgmt on vSphere with NSX ALB providing load balancer service. Deployed TKG workload cluster on vSphere with NSX ALB as well.

On trying to get kubeconfig for workload as non-admin user, I get the following error:

% tanzu cluster kubeconfig get workload
Error: failed to get cluster-info from cluster: failed to get cluster-info from the end-point: Get "https://:0/api/v1/namespaces/kube-public/configmaps/cluster-info": dial tcp :0: connect: can't assign requested address
Usage:
  tanzu cluster kubeconfig get CLUSTER_NAME [flags]

Examples:

    # Get workload cluster kubeconfig
    tanzu cluster kubeconfig get CLUSTER_NAME

    # Get workload cluster admin kubeconfig
    tanzu cluster kubeconfig get CLUSTER_NAME --admin

Flags:
      --admin                Get admin kubeconfig of the workload cluster
      --export-file string   File path to export a standalone kubeconfig for workload cluster
  -h, --help                 help for get
  -n, --namespace string     The namespace where the workload cluster was created. Assumes 'default' if not specified.

Global Flags:
      --log-file string   Log file path
  -v, --verbose int32     Number for the log level verbosity(0-9)

Error: exit status 1

✖  exit status 1

Interestingly, this is the same error I get when I try to get the kubeconfig as a non-admin when Pinniped is not configured. See #1614 .

A kubeconfig get with the --admin option works.

Expected Behavior

In the past, the kubeconfig get command succeeded (TKG v1.3.1) as a non-admin user. However, an attempt to query the workload cluster would launch the Dex console where the AD credentials for the developer could be added. An admin could then add the same credentials to the workload cluster through a ClusterRoleBinding, and that user/developer would then be able to successfully query the workload cluster without admin privileges. I am not able to repeat these steps in this version.

Steps to Reproduce the Bug

Build a TKG mgmt and workload cluster with NSX ALB and Pinniped on vSphere:

MGMT

% kubectl get nodes -o wide
NAME                         STATUS   ROLES                  AGE    VERSION            INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                 KERNEL-VERSION   CONTAINER-RUNTIME
mgmt-control-plane-6bvkx     Ready    control-plane,master   121m   v1.21.2+vmware.1   10.27.51.29   10.27.51.29   VMware Photon OS/Linux   4.19.198-1.ph3   containerd://1.4.6
mgmt-md-0-77589686bc-5l25q   Ready    <none>                 119m   v1.21.2+vmware.1   10.27.51.31   10.27.51.31   VMware Photon OS/Linux   4.19.198-1.ph3   containerd://1.4.6

% kubectl get apps -A
NAMESPACE    NAME                                DESCRIPTION           SINCE-DEPLOY   AGE
default      workload-kapp-controller            Reconcile succeeded   39s            72m
tkg-system   ako-operator                        Reconcile succeeded   29s            109m
tkg-system   antrea                              Reconcile succeeded   59s            109m
tkg-system   load-balancer-and-ingress-service   Reconcile succeeded   5m31s          102m
tkg-system   metrics-server                      Reconcile succeeded   111s           109m
tkg-system   pinniped                            Reconcile succeeded   88s            109m
tkg-system   tanzu-addons-manager                Reconcile succeeded   102s           116m
tkg-system   vsphere-cpi                         Reconcile succeeded   17s            109m
tkg-system   vsphere-csi                         Reconcile succeeded   116s           109m

% kubectl get pods -A
NAMESPACE                           NAME                                                             READY   STATUS      RESTARTS   AGE
avi-system                          ako-0                                                            1/1     Running     0          102m
capi-kubeadm-bootstrap-system       capi-kubeadm-bootstrap-controller-manager-84c75dd587-9grfs       2/2     Running     0          114m
capi-kubeadm-control-plane-system   capi-kubeadm-control-plane-controller-manager-756f646c68-wkdkn   2/2     Running     0          114m
capi-system                         capi-controller-manager-5468bf8995-t8hxr                         2/2     Running     0          114m
capi-webhook-system                 capi-controller-manager-b6f878dd8-sfww5                          2/2     Running     0          114m
capi-webhook-system                 capi-kubeadm-bootstrap-controller-manager-67cf557cc6-d9t87       2/2     Running     0          114m
capi-webhook-system                 capi-kubeadm-control-plane-controller-manager-798bb98b65-nflr4   2/2     Running     0          114m
capi-webhook-system                 capv-controller-manager-8464948ff-wznwn                          2/2     Running     0          114m
capv-system                         capv-controller-manager-76b5574bd9-nxsk2                         2/2     Running     0          114m
cert-manager                        cert-manager-cainjector-59d5cd55f5-5gq97                         1/1     Running     0          120m
cert-manager                        cert-manager-fcbbdd748-5c5tl                                     1/1     Running     0          120m
cert-manager                        cert-manager-webhook-5cd7cf5fbb-m6fhg                            1/1     Running     0          120m
kube-system                         antrea-agent-c8lzv                                               2/2     Running     0          108m
kube-system                         antrea-agent-v99pw                                               2/2     Running     0          108m
kube-system                         antrea-controller-7bcdf898b4-tdlz4                               1/1     Running     0          108m
kube-system                         coredns-8dcb5c56b-cb8bs                                          1/1     Running     0          121m
kube-system                         coredns-8dcb5c56b-rh2vg                                          1/1     Running     0          121m
kube-system                         etcd-mgmt-control-plane-6bvkx                                    1/1     Running     0          121m
kube-system                         kube-apiserver-mgmt-control-plane-6bvkx                          1/1     Running     0          121m
kube-system                         kube-controller-manager-mgmt-control-plane-6bvkx                 1/1     Running     0          121m
kube-system                         kube-proxy-2hqf7                                                 1/1     Running     0          119m
kube-system                         kube-proxy-x7mbd                                                 1/1     Running     0          121m
kube-system                         kube-scheduler-mgmt-control-plane-6bvkx                          1/1     Running     0          121m
kube-system                         metrics-server-5d86c68978-7dngt                                  1/1     Running     0          107m
kube-system                         vsphere-cloud-controller-manager-5gl8n                           1/1     Running     0          107m
kube-system                         vsphere-csi-controller-6b87bd7755-82wbr                          6/6     Running     0          109m
kube-system                         vsphere-csi-node-75svb                                           3/3     Running     0          109m
kube-system                         vsphere-csi-node-pb6vc                                           3/3     Running     0          109m
pinniped-concierge                  pinniped-concierge-d878bc656-hpb45                               1/1     Running     0          108m
pinniped-concierge                  pinniped-concierge-d878bc656-qbd6s                               1/1     Running     0          108m
pinniped-concierge                  pinniped-concierge-kube-cert-agent-08f94512                      1/1     Running     0          106m
pinniped-supervisor                 pinniped-post-deploy-job-gqsc5                                   0/1     Error       0          104m
pinniped-supervisor                 pinniped-post-deploy-job-p5pp9                                   0/1     Error       0          108m
pinniped-supervisor                 pinniped-post-deploy-job-vtpdk                                   0/1     Completed   0          103m
pinniped-supervisor                 pinniped-supervisor-587648d967-9mjf7                             1/1     Running     0          103m
pinniped-supervisor                 pinniped-supervisor-587648d967-hvnkg                             1/1     Running     0          103m
tanzu-system-auth                   dex-69b944c54f-9zh2s                                             1/1     Running     0          103m
tkg-system-networking               ako-operator-controller-manager-78c48bb754-4vxb9                 2/2     Running     0          108m
tkg-system                          kapp-controller-764fc6c69f-xkj78                                 1/1     Running     0          120m
tkg-system                          tanzu-addons-controller-manager-8547c867b4-n8zqp                 1/1     Running     0          116m
tkg-system                          tanzu-capabilities-controller-manager-69f58566d9-47kdr           1/1     Running     0          121m
tkr-system                          tkr-controller-manager-cc88b6968-bgzwc                           1/1     Running     0          121m

Workload

% kubectl config get-contexts
CURRENT   NAME                          CLUSTER          AUTHINFO                                         NAMESPACE
          10.202.112.152                10.202.112.152   wcp:10.202.112.152:administrator@vsphere.local
          cormac-new-ns                 10.202.112.152   wcp:10.202.112.152:administrator@vsphere.local   cormac-new-ns
          kubernetes-admin@kubernetes   kubernetes       kubernetes-admin
*         mgmt-admin@mgmt               mgmt             mgmt-admin
          workload-admin@workload       workload         workload-admin

% kubectl config use-context workload-admin@workload
Switched to context "workload-admin@workload".

% kubectl get nodes -o wide
NAME                             STATUS   ROLES                  AGE   VERSION            INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                 KERNEL-VERSION   CONTAINER-RUNTIME
workload-control-plane-7dlts     Ready    control-plane,master   73m   v1.21.2+vmware.1   10.27.51.39   10.27.51.39   VMware Photon OS/Linux   4.19.198-1.ph3   containerd://1.4.6
workload-md-0-6555d876c9-scvlq   Ready    <none>                 72m   v1.21.2+vmware.1   10.27.51.41   10.27.51.41   VMware Photon OS/Linux   4.19.198-1.ph3   containerd://1.4.6

% kubectl get apps -A
NAMESPACE    NAME                                DESCRIPTION           SINCE-DEPLOY   AGE
tkg-system   antrea                              Reconcile succeeded   2m40s          68m
tkg-system   load-balancer-and-ingress-service   Reconcile succeeded   3m13s          68m
tkg-system   metrics-server                      Reconcile succeeded   3m39s          68m
tkg-system   vsphere-cpi                         Reconcile succeeded   11s            68m
tkg-system   vsphere-csi                         Reconcile succeeded   3m18s          68m

% kubectl get pods -A
NAMESPACE     NAME                                                     READY   STATUS    RESTARTS   AGE
avi-system    ako-0                                                    1/1     Running   0          67m
kube-system   antrea-agent-mkpqt                                       2/2     Running   0          68m
kube-system   antrea-agent-wtlbg                                       2/2     Running   0          68m
kube-system   antrea-controller-6795bcf6f-xpk7d                        1/1     Running   0          68m
kube-system   coredns-8dcb5c56b-2k4mq                                  1/1     Running   0          74m
kube-system   coredns-8dcb5c56b-4lvz9                                  1/1     Running   0          74m
kube-system   etcd-workload-control-plane-7dlts                        1/1     Running   0          73m
kube-system   kube-apiserver-workload-control-plane-7dlts              1/1     Running   0          73m
kube-system   kube-controller-manager-workload-control-plane-7dlts     1/1     Running   0          73m
kube-system   kube-proxy-rt9vg                                         1/1     Running   0          74m
kube-system   kube-proxy-xkdpv                                         1/1     Running   0          72m
kube-system   kube-scheduler-workload-control-plane-7dlts              1/1     Running   0          73m
kube-system   metrics-server-5df75d7cdf-8vnks                          1/1     Running   0          68m
kube-system   vsphere-cloud-controller-manager-47466                   1/1     Running   0          68m
kube-system   vsphere-csi-controller-649d8fb87b-mb6zb                  6/6     Running   0          68m
kube-system   vsphere-csi-node-7cx9n                                   3/3     Running   0          68m
kube-system   vsphere-csi-node-7dj77                                   3/3     Running   0          68m
tkg-system    kapp-controller-78cc98dbdc-2xl6s                         1/1     Running   0          74m
tkg-system    tanzu-capabilities-controller-manager-69f58566d9-bcsvz   1/1     Running   0          74m

I also tried to manually connect to Dex (https://10.27.62.16:30167) but it did not work. There are no errors in the Dex logs either that I could see.

Environment Details

Build version (tanzu version):
version: v1.4.0-pre-alpha-2
buildDate: 2021-08-19 sha: 75cfa0e
Operating System (client): macOS (Big Sur) v11.5.2 vSphere 7.0U2c NSX ALB v2.1.0.5

cormachogan commented 3 years ago

Seems like a known issue. Followed the workaround instructions (add load balancer services to Dex and Pinniped), I was able to use an AD user to access the management cluster.

However, the issue is still apparent on the workload cluster. Will discuss with @stuclem tomorrow.

cormachogan commented 3 years ago

In the off chance that this was because my workload cluster was created before the workaround was put on the management cluster, I deployed a new workload cluster after the management cluster was setup. No dice! On my new workload cluster, I still hit the "can't access address" error:

% tanzu cluster kubeconfig get workload2
Error: failed to get cluster-info from cluster: failed to get cluster-info from the end-point: Get "https://:0/api/v1/namespaces/kube-public/configmaps/cluster-info": dial tcp :0: connect: can't assign requested address

It only works with the --admin extension:

% tanzu cluster kubeconfig get workload2 --admin
Credentials of cluster 'workload2' have been saved
You can now access the cluster by running 'kubectl config use-context workload2-admin@workload2'

cormachogan commented 3 years ago

This issue is addressed in a later build (i.e. TKG v1.4), but was not addressed in TCE v0.7.0. Will close this issue, and retry procedure with TCE v0.8.0-rc2.

vmware-tanzu / community-edition