kubernetes / kubernetes

Production-Grade Container Scheduling and Management
https://kubernetes.io
Apache License 2.0
111.29k stars 39.72k forks source link

node authorizer may have a delay in updating the graph #128910

Open chaunceyctx97 opened 3 days ago

chaunceyctx97 commented 3 days ago

What happened?

In k8s 1.29 cluster, I created 100 pods, each of which mounted 50 secrets. Then I deleted all the pods and created 100 pods again, repeating this process. I noticed that there were warning events about failed to sync secret cache: timed out waiting for the condition. After checking the apiserver logs, I found that the response code of corresponding request was 403 meaning unauthorized. Maybe the node authorizer may have a delay in updating the graph?

I1008 09:55:48.683488   13 httplog.go:131] "HTTP" verb="GET" URI="/api/v1/namespaces/default/secrets?fieldSelector=metadata.name%3Dsecret-kms-2&limit=500&resourceVersion=0" latency="1.324644ms" userAgent="kubelet/v1.29.2 (Linux/amd64) kubernetes/3ffab2c" audit-ID="d73de627-ff5b-3412-4416-97bada04664c" srcIP="192.168.121.154:44700" apf_pl="system" apf_fs="system-nodes" apf_iseats=1 apf_fseats=0 apf_additionalLatency="0s" apf execution time="139.57us" resp=403

/sig auth

What did you expect to happen?

no warning events appear

How can we reproduce it (as minimally and precisely as possible)?

In k8s 1.29 cluster

  1. created 100 pods, each of which mounted 50 secrets.
  2. deleted all the pods

repeating above process

Anything else we need to know?

No response

Kubernetes version

```console $ kubectl version # paste output here ```

Cloud provider

OS version

```console # On Linux: $ cat /etc/os-release # paste output here $ uname -a # paste output here # On Windows: C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture # paste output here ```

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot commented 3 days ago

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
SuQiucheng commented 2 days ago

In plugin/pkg/auth/authorizer/node/graph_populator.go AddGraphEventHandlers, when add pod, the graph will add pod, and then add the secret or configmap.Then when the request arrived ,the apiserver will authorize. I think it may be the reason why the response code of corresponding request was 403. But I don't know how to fix it.

liggitt commented 8 hours ago

Did you force delete the original pods (grace period of 0)?

Were the forbidden messages seen between when the old pods were deleted and the new ones created, or after the new ones were created? If after, how long after they were scheduled to the node?

Did the forbidden error resolve itself, or block the pods from starting?