lensapp / lens

Lens - The way the world runs Kubernetes
https://k8slens.dev/
MIT License
22.51k stars 1.47k forks source link

OpenLens 6.3.0 on macOS does not show pods and config/network/storage sections are empty, all work fine in 6.2.6. #6858

Closed dmarjanovic closed 1 year ago

dmarjanovic commented 1 year ago

Describe the bug OpenLens 6.3.0 on macOS does not show pods and config/network/storage sections are empty, all work fine in 6.2.6.

To Reproduce Steps to reproduce the behavior:

  1. git checkout v6.3.0 # or master d6531f2
  2. make clean && make dev
  3. When window appears, connect to a namespace (there's already configured single namespace in Namespace/Accessible Namespaces)
  4. Expand 'Workloads'
  5. Do not see Pods, but can see Deployments, DaemonSets, StatefulSets, ReplicaSets, Jobs, CronJobs
  6. Scroll down to 'Config' and expand
  7. Do not see ConfigMaps, Secrets, Resource Quotas, Limit Ranges, HPA, Pod Disruption Budgets
  8. Scroll down to 'Network' and expand
  9. Do not see Services, Endpoints, Ingresses, Network Policies but can only see Port Forwarding
  10. Scroll down to 'Storage'
  11. Can't expand as it's empty (can't see Persistent Volume Claims')

Expected behavior

  1. I see Pods in 'Workloads' section
  2. I see ConfigMaps, Secrets, Resource Quotas, Limit Ranges, HPA, Pod Disruption Budgets in 'Config' section
  3. I see Services, Endpoints, Ingresses, Network Policies in 'Network' section
  4. I see Persistent Volume Claims in 'Storage' section

Screenshots Wrong one from OpenLens 6.3.0: Screenshot 2023-01-02 at 16 21 36

Correct one from OpenLens 6.2.6: Screenshot 2023-01-02 at 16 14 32

Environment (please complete the following information):

Kubeconfig:

apiVersion: v1
clusters:
- cluster:
    server: https://api-kube.***
  name: cluster1
...
contexts:
- context:
    cluster: cluster1
    namespace: ns1
    user: some-user
  name: cluster1-ns1
...
current-context: cluster1-ns1
kind: Config
preferences: {}
users:
- name: some-user
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1beta1
      args:
      - oidc-login
      - get-token
      - --oidc-issuer-url=https://***
      - --oidc-client-id=api-kube
      - --oidc-client-secret=***
      - --oidc-extra-scope=groups
      command: kubectl
      env: null
      interactiveMode: IfAvailable
      provideClusterInfo: false
Nokel81 commented 1 year ago

What sort of kube distro is this cluster for?

Fantaztig commented 1 year ago

Same (or even worse) issue for me. In my case I cannot see any resources besides CRDs which work fine.

Were running on VMWare Tanzu, kubernetes version 1.22.9

I have cluster admin permissions inside the cluster and can get/describe all resources via the CLI.

Installed lens 6.2.6 and was upgraded to 6.3.0, in the old version everything worked as expected.

Nokel81 commented 1 year ago

Fixed by https://github.com/lensapp/lens/pull/6880

dmarjanovic commented 1 year ago

@Nokel81

What sort of kube distro is this cluster for?

We're with aws eks, if that helps.

Fixed by #6880

The #6880 contribution feels much more stable (faster) than 3.6.0 but the problem (and screenshots) described in this issue are still not solved with it. I've verified with latest commit d34a13fad2 in farodin91:set-correct-group-in-rbac remote:branch

Any other ideas? Tnx

Nokel81 commented 1 year ago

It certainly feels like an RBAC related issue. What is the response for the following command?

kubectl create -f - -o yaml << EOF
apiVersion: authorization.k8s.io/v1
kind: SelfSubjectRulesReview
spec:
  namespace: default
EOF
dmarjanovic commented 1 year ago

@Nokel81 I can't reveal config details sorry. Is there something specific you're looking for? Or let me know if there's another way to help diagnosing this issue. Thank you

Nokel81 commented 1 year ago

@dmarjanovic Okay fair enough. How about this...

if you run kubectl get --raw /api you should get back a JSON object that has a field called versions. What is its value? On GKE and minikube it is ["v1"]

gfarcas commented 1 year ago

I have the same problem, 6.2.5 or k9s works though

Nokel81 commented 1 year ago

@gfarcas Can you run the command that I asked @dmarjanovic to run?

HuguesJ commented 1 year ago

Same issue on Windows, with an AKS cluster. I run the command from @dmarjanovic and got the same result than him.

Nokel81 commented 1 year ago

Can you please also run kubectl get --raw /api/v1 then? Do you get a list that includes pods?

HuguesJ commented 1 year ago

@Nokel81 Not sure what you mean by pods. kubectl get --raw /api/v1 returns a json object with some elements with pods in their name like pods,pods/attach, pods/binding , it doesn't return the list of all pods.

Nokel81 commented 1 year ago

No that is what I meant and expected you to get. Hmmm.... 🤔

HuguesJ commented 1 year ago

@Nokel81 Interesting enough, this problem happens only in one cluster (AKS) out of 4 (AKS and GKE).

Nokel81 commented 1 year ago

@HuguesJ Thanks for the info, will investigate against an AKS cluster

Nokel81 commented 1 year ago

Here is some more debugging that would be helpful

  1. Does kubectl get namespaces succeed? (I assume it does)
  2. Running the following
kubectl create -f - -o yaml << EOF
apiVersion: authorization.k8s.io/v1
kind: SelfSubjectRulesReview
spec:
  namespace: default
EOF

both succeeds and returns something like:

apiVersion: authorization.k8s.io/v1
kind: SelfSubjectRulesReview
metadata:
  creationTimestamp: null
spec: {}
status:
  incomplete: false
  nonResourceRules:
  - nonResourceURLs:
    - /healthz
    - /livez
    - /readyz
    - /version
    - /version/
    verbs:
    - get
  - nonResourceURLs:
    - '*'
    verbs:
    - '*'
  - nonResourceURLs:
    - /api
    - /api/*
    - /apis
    - /apis/*
    - /healthz
    - /livez
    - /openapi
    - /openapi/*
    - /readyz
    - /version
    - /version/
    verbs:
    - get
  resourceRules:
  - apiGroups:
    - authorization.k8s.io
    resources:
    - selfsubjectaccessreviews
    - selfsubjectrulesreviews
    verbs:
    - create
  - apiGroups:
    - '*'
    resources:
    - '*'
    verbs:
    - '*'

Note the things I am interested about is the .status.resourceRules part, is there an entry similar to:

- apiGroups:
  - '*'
  resources:
  - '*'
  verbs:
  - '*'

If not, are the fields apiGroups and resources empty, or maybe are they missing entirely?

dmarjanovic commented 1 year ago

@Nokel81 sry for late answer, I'm unable to iterate quickly.

Re.

if you run kubectl get --raw /api you should get back a JSON object that has a field called versions. What is its value? On GKE and minikube it is ["v1"]

$ kubectl get --raw /api
{"kind":"APIVersions","versions":["v1"]...

Re.

  1. Does kubectl get namespaces succeed? (I assume it does)

Yes, there are multiple namespaces and the same regression is present in all.

  1. Running the following kubectl create -f - -o yaml << EOF...

In .status.resourceRules in the output there's not the below part you expect to be shown:

- apiGroups:
  - '*'
  resources:
  - '*'
  verbs:
  - '*'

Instead, I can see (among few other info that I had to remove) 20 times repeated this part (btw not sure why it's redundant):

- apiGroups:
    - ""
    resources:
    - nodes
    - namespaces
    verbs:
    - get
    - list
    - watch
  - apiGroups:
    - metrics.k8s.io
    resources:
    - nodes
    verbs:
    - get
    - list
    - watch
  - apiGroups:
    - apiextensions.k8s.io
    resources:
    - customresourcedefinitions
    verbs:
    - get
    - list
    - watch

In addition, I see:

- apiGroups:
  - policy
  resourceNames:
  - eks.privileged
  resources:
  - podsecuritypolicies
  verbs:
  - use

which is probably the way how to configure it in AWS EKS in idiomatic way but I am not aware of technical details except that part is also documented in https://docs.aws.amazon.com/eks/latest/userguide/pod-security-policy.html

Anyways, I would expect that some OpenLens code relevant to k8s resources discovery changed in 6.3.0 compared to 6.2.6, not aware of diff though. Any other hints on how to diagnose this further? Thank you for looking into this.

Nokel81 commented 1 year ago

Yes there was a bug fix that did change the related code.

In the list is there something like:

  - apiGroups:
    - ""
    resources:
    - pods
    verbs:
    - get
    - list
    - watch

?

dmarjanovic commented 1 year ago

@Nokel81 no, there's not such part. I assume EKS does it differently as mentioned with policy from my previous comment, but not sure.

dmarjanovic commented 1 year ago

@Nokel81 btw, there's one detail connected to resources discovery that might be or might not be connected, but we noticed even with older (working) OpenLens versions similar "buggy" behaviour of not seeing those resources (including pods) mentioned in this ticket when connecting to some context for the first time. Only after namespace name is explicitly set (see screenshot) then disconnect+connect would show all the resources in this namespace - this "workaround" worked fine until 6.2.6 but not now in 6.3.0, not sure if it's connected though. We have permission scheme configured per namespace, not cluster.

Screenshot 2023-01-10 at 14 09 04
Nokel81 commented 1 year ago

So that list is for when the user does not have permissions to list namespaces. We do still read that list, but the change that is present in 6.3.0 attempts to fix the bug where only the first 10 namespaces are checked for these permissions.

dmarjanovic commented 1 year ago

@Nokel81 is it possible to pin point to specific PR causing this behaviour, maybe #6657?

Nokel81 commented 1 year ago

That PR fixes issues with https://github.com/lensapp/lens/pull/6614 to support "incomplete" responses like those which GKE return.

kienhoefr commented 1 year ago

Yes there was a bug fix that did change the related code.

In the list is there something like:

  - apiGroups:
    - ""
    resources:
    - pods
    verbs:
    - get
    - list
    - watch

?

@Nokel81 I'm working with AWS EKS (using OpenLens 6.3.0) and I have the same issue. When I do a SelfSubjectRulesReview I get (among other things):

- apiGroups:
    - ""
    resources:
    [...]
    - pods
    [...]
    - services
    [...]
    verbs:
    - get
    - list
    - watch

Hope this helps.

Nokel81 commented 1 year ago

Yes that helps a lot, thanks

dmarjanovic commented 1 year ago

@Nokel81 thank you for the fix.

Btw #6900 only partially fixes this issue for us in a way that after the fix is applied the issue is not reproducible for some namespaces and still persists for the other ones. To be more precisely: the "working" or "non-working" namespaces are now different after OpenLens app restart meaning on 1st start the namespace A is "working" and B is "non-working" but on 2nd run A may become "non-working" and B "working" etc. Weird behaviour. Also, I didn't see anything weird (or different) in logs comparing to v6.2.6.

Nokel81 commented 1 year ago

By working what do you mean exactly? We attempt to list all the namespaces and then attempt to compute which resources are allowed to be listed in at least one of those namespaces. What namespace you select in the UI shouldn't matter (except for the "Accessible Namespaces" setting which just overrides the list namespace step above).

dmarjanovic commented 1 year ago

@Nokel81 sorry for using confusing terminology. I built and run app from master c361852dd2 (6.4.0-alpha.3). All below screens were from single run, no restarts performed.

We use single ~/.kube/config file with single eks cluster, many namespaces, 1 context per namespace. It appears in Lens Catalog/Clusters as 11 "clusters" but matches basically 11 contexts. Btw, it says Distro "eks" but not for all contexts, not sure why.

02_openlens_6 4 0-alpha 3_11_clusters_connected

Steps I did: was clicking from the top starting with the 1st pinned "cluster" (context) then clicked on 2nd and so on until the last 11th pinned "cluster". Some are "working" meaning pods are shown and the other ones are "failed" meaning no pods are shown. See first 11 screenshots:

03_1 cluster_fails 04_2 cluster_ok 05_3 cluster_fails 06_4 cluster_ok 07_5 cluster_fails 07_6 cluster_ok 08_7 cluster_fails 09_8 cluster_fails 10_9 cluster_fails 11_10 cluster_ok 12_11 cluster_ok

I started clicking on other pinned "clusters" that were "working" before but now they were changing from "working" to "failed" or from "working to failed to working" and even those that were "failed" then "working" started going to "failed" then some of them to "working" again without any meaningful pattern.

17_4 cluster_fails 19_8 cluster_fails 20_10 cluster_fails 21_9 cluster_fails 22_6 cluster_fails 23_5 cluster_fails 24_1 cluster_fails 25_8 cluster_ok 26_9 cluster_ok 27_10 cluster_ok

At this point I was clicking on all the 11 pinned "clusters" but their state was not changing any more.

Now I did restart and repeat clicking on 11 pinned "clusters" and now "working" and "failed" ones were different than in previous run. Totally random behaviour comparing to previous run. No pattern found. Looked to me like there was some race condition going on. While doing screenshots I did breaks to work on screen, not sure if that influenced behaviour too.

Note that all these namespaces (the 11 contexts) defined in ~/.kube/config do have pods and are shown from all the other tools correctly including k9s, kubectl, OpenLens 6.2.6.

Nokel81 commented 1 year ago

Okay thanks.

One question. Do you have list namespace permissions for this Kube cluster?

dmarjanovic commented 1 year ago

@Nokel81 yes, I can list namespaces everywhere.

dmarjanovic commented 1 year ago

@Nokel81 not sure if anyone is still looking into this issue but shouldn't it be open as it's regression in 6.3.0 and still not fully resolved in 6.4.0-alpha.3 comparing to 6.2.6? Tnx

Nokel81 commented 1 year ago

@dmarjanovic for 6.4.0-alpha.3 do you have "accessible namespaces" configured for those? That might cause this, though it should be stable. We plan to make this more resilient against network failures.

dmarjanovic commented 1 year ago

@Nokel81 yes, "accessible namespaces" was indeed configured for every "cluster". So workaround is to remove "accessible namespaces" from every "cluster" then issue is not reproducible any more for me. Tested with 6.4.0-alpha.3. Also, workaround "works" kind of with 6.3.0 too but that version is much less stable.

Should I report separate issue for "accessible namespaces" or you have it on your radar? Thank you

Nokel81 commented 1 year ago

So that list is really only supposed to be fore when a "cluster" (or more specifically the user associated with the context within a Kubeconfig file) does not have LIST namespace permissions.

Namely, if you configure those then we only will look at those namespaces when determining the "resources to show". So it would only be a bug if the same cluster changes which resources it shows when reconnected to.

If IIRC that is indeed what you said was happening, right?

djarami726 commented 1 year ago

Same issue on Windows - doesn't work on 6.3.0 nor 6.4.0-alpha.3. I was not able to find 6.2.6 installer - only found 6.2.5 and it didn't work with that one either. I don't see the cluster, nodes options and no items listed under overview. This problem happens on one of my AKS clusters, I have several others where these items show up fine. I'm able to do get pods -A and get ns -A.

If you want me to try 6.2.6, please share a link for windows that I can try.

dmarjanovic commented 1 year ago

So that list is really only supposed to be fore when a "cluster" (or more specifically the user associated with the context within a Kubeconfig file) does not have LIST namespace permissions.

@Nokel81 I see, thanks for the explanation. To keep it sane my colleagues and I will do more tests when next iteration is available via brew. Will then also re-check with my colleagues that do not have LIST namespace permission and see if it's blocking them or they experience issue (since for me not using "accessible namespaces" works just fine for now). Thank you

vbabenkoru commented 1 year ago

I got the same issue, 6.2.* is working fine. We're running EKS.

tintranvan commented 1 year ago

I am using EKS, for Service Account Role i have add one more ClusterRole and ClusterRoleBinding to allow Service Account who can get namespace

 kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: namespace-list-role
rules:
- apiGroups: [""]
  resources: ["namespaces"]
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: namespace-list-role
roleRef: 
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: namespace-list-role
subjects: # points to my ServiceAccount
- kind: Group
  name: ns-fullaccess-group
  apiGroup: rbac.authorization.k8s.io

Beside namespace permission for EKS Accounts with Role and RoleBinding It works well for me.