kubernetes-sigs / aws-efs-csi-driver

CSI Driver for Amazon EFS https://aws.amazon.com/efs/
Apache License 2.0
699 stars 528 forks source link

efs startpuptaint not works correctly #1273

Closed Mieszko96 closed 3 days ago

Mieszko96 commented 5 months ago

/kind bug

What happened? i saw new feature was requested in this ticket https://github.com/kubernetes-sigs/aws-efs-csi-driver/issues/1069

I have infrastructure with karpenter as autoscale tool. And we have problem with installing apps using EFS on brand new node. I added startupTaint to prevent it, but from my POV it looks like is deleting this taint to fast

i made some more steps how i testing this in karpenter ticket https://github.com/aws/karpenter-provider-aws/issues/5691

but it seems to be a problem on EFS side.

What you expected to happen?

How to reproduce it (as minimally and precisely as possible)?

Anything else we need to know?:

Environment

Please also attach debug logs to help us better diagnose

steved commented 4 months ago

I ran into this as well. It looks like the EBS driver may not be encountering this because of some delays added: https://github.com/kubernetes-sigs/aws-ebs-csi-driver/pull/1949.

Specifically, I see eks:node-manager tainting the node after the driver has started:

I0326 02:14:25.282728       1 round_trippers.go:463] GET https://172.20.0.1:443/api/v1/nodes/ip-10-0-39-238.us-west-2.compute.internal
I0326 02:14:25.282736       1 round_trippers.go:469] Request Headers:
I0326 02:14:25.282742       1 round_trippers.go:473]     Accept: application/json, */*
I0326 02:14:25.282747       1 round_trippers.go:473]     User-Agent: aws-efs-csi-driver/v0.0.0 (linux/amd64) kubernetes/$Format
I0326 02:14:25.282753       1 round_trippers.go:473]     Authorization: Bearer <masked>
I0326 02:14:25.298894       1 round_trippers.go:574] Response Status: 200 OK in 16 milliseconds
I0326 02:14:25.299687       1 node.go:486] "No taints to remove on node, skipping taint removal"
I0326 02:14:25.299702       1 driver.go:137] Listening for connections on address: &net.UnixAddr{Name:"/csi/csi.sock", Net:"unix"}
I0326 02:14:28.015603       1 node.go:311] NodeGetInfo: called with args
$ jq -r '.[0]["@message"] | .user.username, .objectRef, .requestObject, .requestReceivedTimestamp' logs-insights-results.json
eks:node-manager
{
  "resource": "nodes",
  "name": "ip-10-0-39-238.us-west-2.compute.internal",
  "apiVersion": "v1"
}
{
  "spec": {
    "taints": [
      {
        "effect": "NoExecute",
        "key": "efs.csi.aws.com/agent-not-ready",
        "value": "true"
      },
      {
        "effect": "NoExecute",
        "key": "ebs.csi.aws.com/agent-not-ready",
        "value": "true"
      }
    ]
  }
}
2024-03-26T02:14:31.296272Z
k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 5 days ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

mskanth972 commented 3 days ago

We released a new version(2.0.6) which has the Fix to GitHub and related PR. ECD for the Addons will be 07/31.