uswitch / kiam

Integrate AWS IAM with Kubernetes
Apache License 2.0
1.15k stars 238 forks source link

NoCredentialsFound #497

Closed droctothorpe closed 2 years ago

droctothorpe commented 3 years ago

I'm using KIAM on a kops-provisioned K8s cluster (v1.19.7) with https://gateway.dask.org/.

KIAM works great for us until we do an S3 read/write across a distributed Dask cluster with dozens of workers. When executing large-scale distributed operations like this, one or several of the Dask workers report a NoCredentialsFound error.

There are no logs in the KIAM agent or server.

I'm wondering if maybe the KIAM agents are not able to keep up with the simultaneous requests from many Dask workers at the same time.

Any insight / input would be greatly appreciated.

droctothorpe commented 2 years ago

We added an init container that looks something like this to get around this issue:

    c.KubeClusterConfig.worker_extra_pod_config["initContainers"] = [{
        "name": "wait-for-kiam",
        "image": "",
        "command": [
            "sh",
            "-c",
            "for i in $(seq 1 12); do [ $i -gt 1 ] && sleep 5; aws sts get-caller-identity && s=0 && break || s=$?; done; (exit $s)"
        ],
        "env": c.KubeClusterConfig.worker_extra_container_config.get("env", [])
    }]

Janky af but it worked iirc.