uswitch / kiam

Integrate AWS IAM with Kubernetes
Apache License 2.0
1.15k stars 238 forks source link

unable to assign pod the iam role #46

Closed tasdikrahman closed 6 years ago

tasdikrahman commented 6 years ago

checked the kiam-agent logs on the node where the pod (which was to be assigned the iam role) was scheduled which look like

{"addr":"10.2.40.2:34938","headers":{},"level":"info","method":"GET","msg":"processed request","path":"/latest/meta-data/iam/security-credentials/","status":200,"time":"2018-03-15T07:02:32Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.40.2","time":"2018-03-15T07:02:32Z"}
{"addr":"10.2.40.2:34940","level":"error","method":"GET","msg":"error processing request: error checking assume role permissions: rpc error: code = Unknown desc = pod not found","path":"/latest/meta-data/iam/security-credentials/my-app-iam-role","status":500,"time":"2018-03-15T07:02:32Z"}
{"addr":"10.2.40.2:34940","headers":{"Content-Type":["text/plain; charset=utf-8"],"X-Content-Type-Options":["nosniff"]},"level":"info","method":"GET","msg":"processed request","path":"/latest/meta-data/iam/security-credentials/my-app-iam-role","status":500,"time":"2018-03-15T07:02:32Z"}
{"addr":"10.2.40.2:34942","headers":{},"level":"info","method":"GET","msg":"processed request","path":"/latest/meta-data/iam/security-credentials/","status":200,"time":"2018-03-15T07:02:33Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.40.2","time":"2018-03-15T07:02:33Z"}
{"addr":"10.2.40.2:34944","level":"error","method":"GET","msg":"error processing request: error checking assume role permissions: rpc error: code = Unknown desc = pod not found","path":"/latest/meta-data/iam/security-credentials/my-app-iam-role","status":500,"time":"2018-03-15T07:02:33Z"}
{"addr":"10.2.40.2:34944","headers":{"Content-Type":["text/plain; charset=utf-8"],"X-Content-Type-Options":["nosniff"]},"level":"info","method":"GET","msg":"processed request","path":"/latest/meta-data/iam/security-credentials/my-app-iam-role","status":500,"time":"2018-03-15T07:02:33Z"}
{"addr":"10.2.40.2:34946","headers":{},"level":"info","method":"GET","msg":"processed request","path":"/latest/meta-data/iam/security-credentials/","status":200,"time":"2018-03-15T07:02:33Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.40.2","time":"2018-03-15T07:02:33Z"}
{"addr":"10.2.40.2:34948","level":"error","method":"GET","msg":"error processing request: error checking assume role permissions: rpc error: code = Unknown desc = pod not found","path":"/latest/meta-data/iam/security-credentials/my-app-iam-role","status":500,"time":"2018-03-15T07:02:33Z"}
{"addr":"10.2.40.2:34948","headers":{"Content-Type":["text/plain; charset=utf-8"],"X-Content-Type-Options":["nosniff"]},"level":"info","method":"GET","msg":"processed request","path":"/latest/meta-data/iam/security-credentials/my-app-iam-role","status":500,"time":"2018-03-15T07:02:33Z"}

running uswitch/kiam:v2.4 on both the agent and the server.

The namespace where the pod is scheduled has the annotation

  annotations:
    iam.amazonaws.com/permitted: .*

as stated in the docs

along with

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

in the trust relationships on the iam-role picked up the node where the pods are being scheduled.

Not sure if it's related, but we recently upgraded k8s from 1.8.4 to 1.8.9, but I guess that shouldn't be the problem.

tasdikrahman commented 6 years ago

strange, killing the kiam agent/server pods and letting the new ones come up fixed the issue. Any ideas on it?

pingles commented 6 years ago

Not off the top of my head- do you have the log data from the server processes? From the errors you forwarded before it sounds most likely an issue inside the server process.

On Thu, 15 Mar 2018 at 07:48, Tasdik Rahman notifications@github.com wrote:

strange, killing the kiam agent/server pods and letting the new ones come up fixed the issue. Any ideas on it?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/uswitch/kiam/issues/46#issuecomment-373289519, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAEfpCgm6glIOuZf0Xelo99eWHprVD1ks5tehy3gaJpZM4SrrIo .

tasdikrahman commented 6 years ago

Ah too bad, I didn't get the logs from the server agent before deleting them. :/

pingles commented 6 years ago

Yeah sorry, without the server logs its difficult to know what the problem is. I'm going to close this for now but please reopen with if it happens again with as much log data as you can capture please.

Thanks!

tasdikrahman commented 6 years ago

Hey thanks @pingles , will post here again if I face the issue again. Thanks for your time.

pingles commented 6 years ago

No problem, thanks for reporting an issue.

On Thu, 15 Mar 2018 at 10:40, Tasdik Rahman notifications@github.com wrote:

Hey thanks @pingles https://github.com/pingles , will post here again if I face the issue again. Thanks for your time.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/uswitch/kiam/issues/46#issuecomment-373333198, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAEfn-5_CKUcF9BzcOUQRSXxTNKKDxkks5tekUlgaJpZM4SrrIo .

tasdikrahman commented 6 years ago

Hey @pingles, whilst upgrading a cluster of ours. We faced the above issue again.

Logs from one of the client-agents

{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:40Z"}
{"addr":"10.2.3.24:54024","headers":{},"level":"info","method":"GET","msg":"processed request","path":"/latest/meta-data/iam/security-credentials/","status":200,"time":"2018-03-21T14:23:40Z"}
{"addr":"10.2.3.24:54024","headers":{"Content-Type":["application/json"]},"level":"info","method":"GET","msg":"processed request","path":"/latest/meta-data/iam/security-credentials/my-iam-role","status":200,"time":"2018-03-21T14:23:40Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:40Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:40Z"}
{"addr":"10.1.54.111:37730","headers":{},"level":"info","method":"GET","msg":"processed request","path":"/ping","status":200,"time":"2018-03-21T14:23:40Z"}
{"addr":"10.2.3.39:35978","level":"error","method":"GET","msg":"error processing request: rpc error: code = Unknown desc = pod not found","path":"/latest/meta-data/iam/security-credentials/","status":500,"time":"2018-03-21T14:23:40Z"}
{"addr":"10.2.3.39:35978","headers":{"Content-Type":["text/plain; charset=utf-8"],"X-Content-Type-Options":["nosniff"]},"level":"info","method":"GET","msg":"processed request","path":"/latest/meta-data/iam/security-credentials/","status":500,"time":"2018-03-21T14:23:40Z"}
{"addr":"10.1.54.111:37770","headers":{},"level":"info","method":"GET","msg":"processed request","path":"/ping","status":200,"time":"2018-03-21T14:23:43Z"}

Logs from one of the server agents


ERROR: logging before flag.Parse: E0321 09:56:13.663622       1 reflector.go:315] github.com/uswitch/kiam/pkg/k8s/pod_cache.go:227: Failed to watch *v1.Pod: Get https://10.3.0.1:443/api/v1/pods?resourceVersion=26477282&timeoutSeconds=501&watch=true: dial tcp 10.3.0.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0321 09:56:13.663639       1 reflector.go:315] github.com/uswitch/kiam/pkg/k8s/namespace_cache.go:83: Failed to watch *v1.Namespace: Get https://10.3.0.1:443/api/v1/namespaces?resourceVersion=26255952&timeoutSeconds=503&watch=true: dial tcp 10.3.0.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0321 09:56:14.664819       1 reflector.go:205] github.com/uswitch/kiam/pkg/k8s/pod_cache.go:227: Failed to list *v1.Pod: Get https://10.3.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 10.3.0.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0321 09:56:14.665933       1 reflector.go:205] github.com/uswitch/kiam/pkg/k8s/namespace_cache.go:83: Failed to list *v1.Namespace: Get https://10.3.0.1:443/api/v1/namespaces?resourceVersion=0: dial tcp 10.3.0.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0321 09:56:15.665916       1 reflector.go:205] github.com/uswitch/kiam/pkg/k8s/pod_cache.go:227: Failed to list *v1.Pod: Get https://10.3.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 10.3.0.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0321 09:56:15.666963       1 reflector.go:205] github.com/uswitch/kiam/pkg/k8s/namespace_cache.go:83: Failed to list *v1.Namespace: Get https://10.3.0.1:443/api/v1/namespaces?resourceVersion=0: dial tcp 10.3.0.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0321 09:56:16.667094       1 reflector.go:205] github.com/uswitch/kiam/pkg/k8s/pod_cache.go:227: Failed to list *v1.Pod: Get https://10.3.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 10.3.0.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0321 09:56:16.667965       1 reflector.go:205] github.com/uswitch/kiam/pkg/k8s/namespace_cache.go:83: Failed to list *v1.Namespace: Get https://10.3.0.1:443/api/v1/namespaces?resourceVersion=0: dial tcp 10.3.0.1:443: getsockopt: connection refused
{"level":"info","msg":"found role","pod.iam.role":"my-iam-role","pod.ip":"10.2.2.193","time":"2018-03-21T09:56:17Z"}
{"level":"info","msg":"requesting credentials","pod.iam.role":"my-iam-role","time":"2018-03-21T09:56:17Z"}
{"level":"info","msg":"found role","pod.iam.role":"my-iam-role","pod.ip":"10.2.3.25","time":"2018-03-21T09:56:17Z"}
ERROR: logging before flag.Parse: E0321 09:56:17.668059       1 reflector.go:205] github.com/uswitch/kiam/pkg/k8s/pod_cache.go:227: Failed to list *v1.Pod: Get https://10.3.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 10.3.0.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0321 09:56:17.669509       1 reflector.go:205] github.com/uswitch/kiam/pkg/k8s/namespace_cache.go:83: Failed to list *v1.Namespace: Get https://10.3.0.1:443/api/v1/namespaces?resourceVersion=0: dial tcp 10.3.0.1:443: getsockopt: connection refused
...
...
{"credentials.access.key":"ASIAIKN7LBPV3EDBRAHA","credentials.expiration":"2018-03-21T10:06:38Z","credentials.role":"my-second-iam-role","generation.metadata":0,"level":"info","msg":"fetched credentials","pod.iam.role":"prod-harvester","pod.name":"pod1-76b6d9746-l8w2f","pod.namespace":"ns1","pod.status.ip":"10.2.11.145","pod.status.phase":"Running","resource.version":"26310469","time":"2018-03-21T09:56:18Z"}
{"credentials.access.key":"ASIAIPZZL4EO5OJJTDQA","credentials.expiration":"2018-03-21T10:07:37Z","credentials.role":"my-third-iam-role","generation.metadata":0,"level":"info","msg":"fetched credentials","pod.iam.role":"prod-euler","pod.name":"pod2-844d859d8-p4jnt","pod.namespace":"ns2","pod.status.ip":"10.2.10.13","pod.status.phase":"Running","resource.version":"22112994","time":"2018-03-21T09:56:18Z"}
ERROR: logging before flag.Parse: E0321 09:56:18.838270       1 runtime.go:66] Observed a panic: &runtime.TypeAssertionError{interfaceString:"interface {}", concreteString:"cache.DeletedFinalStateUnknown", assertedString:"*v1.Pod", missingMethod:""} (interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.Pod)
/go/src/github.com/uswitch/kiam/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
/go/src/github.com/uswitch/kiam/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/github.com/uswitch/kiam/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/asm_amd64.s:509
/usr/local/go/src/runtime/panic.go:491
/usr/local/go/src/runtime/iface.go:172
/go/src/github.com/uswitch/kiam/pkg/k8s/pod_cache.go:126
/go/src/github.com/uswitch/kiam/pkg/k8s/pod_cache.go:214
/go/src/github.com/uswitch/kiam/vendor/k8s.io/client-go/tools/cache/delta_fifo.go:451
/go/src/github.com/uswitch/kiam/vendor/k8s.io/client-go/tools/cache/controller.go:150
/go/src/github.com/uswitch/kiam/vendor/k8s.io/client-go/tools/cache/controller.go:124
/go/src/github.com/uswitch/kiam/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/go/src/github.com/uswitch/kiam/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/go/src/github.com/uswitch/kiam/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/go/src/github.com/uswitch/kiam/vendor/k8s.io/client-go/tools/cache/controller.go:124
/usr/local/go/src/runtime/asm_amd64.s:2337

Please let me know if you want something else from the logs. Thanks

pingles commented 6 years ago

Interesting, definitely looks like something's wrong there. I've reopened.

pingles commented 6 years ago

So kiam definitely assumes that the deltas delivered by the k8s client are only for *v1.Pod.

Had a quick search for the error and this seems the same: https://github.com/kubernetes/kubernetes/commit/1c65d1df86e773ce9e1496163e724fa52a6dd864

Should be relatively easy to fix. I'll try and do it asap unless someone else beats me to it!

pingles commented 6 years ago

And more relevant docs: https://github.com/kubernetes/client-go/blob/master/tools/cache/delta_fifo.go#L656

tasdikrahman commented 6 years ago

Thanks for your time, appreciate it :)

pingles commented 6 years ago

I committed a fix earlier for this but I've also just changed again to remove some of the pod cache internals.

This should also remove the runtime.TypeAssertionError for cache.DeletedFinalStateUnknown.

The latest PR that addresses this (#51) also changes the server boot process so that the pod and namespace caches must Sync before the gRPC listener starts. Hopefully both of those should mean the server process behaves better for you.

@tasdikrahman I'm going to close this issue for now (relating to the type error). If after updating Kiam (sorry, you'll need to use latest or the SHA for now) you see the erroneous behaviour again (pod not found errors) please re-open.

tasdikrahman commented 6 years ago

I'm going to close this issue for now (relating to the type error). If after updating Kiam (sorry, you'll need to use latest or the SHA for now) you see the erroneous behaviour again (pod not found errors) please re-open.

No problem at all. Will update on this issue if see the error again. Thanks a lot! Just for my sanity, was curious if a release is scheduled around 😀after 2.6

pingles commented 6 years ago

Yep- we’ll probably do a release soon. I’d like to get better Prometheus metrics exported first (which should be quite quick) then do a release so perhaps within a few days/week?

On Wed, 25 Apr 2018 at 08:49, Tasdik Rahman notifications@github.com wrote:

I'm going to close this issue for now (relating to the type error). If after updating Kiam (sorry, you'll need to use latest or the SHA for now) you see the erroneous behaviour again (pod not found errors) please re-open.

No problem at all. Will update on this issue if see the error again. Thanks a lot! Just for my sanity, was curious if a release is scheduled around 😀after 2.6

— You are receiving this because you modified the open/close state.

Reply to this email directly, view it on GitHub https://github.com/uswitch/kiam/issues/46#issuecomment-384192605, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAEfmgPKiRsrgHwk0njYGCIWHZCya2gks5tsCp4gaJpZM4SrrIo .

tasdikrahman commented 6 years ago

Yep- we’ll probably do a release soon. I’d like to get better Prometheus metrics exported first (which should be quite quick) then do a release so perhaps within a few days/week?

Sounds good, thanks for the help! :)

pingles commented 6 years ago

I’ll try and do a release today- that’ll pull in the error handling fix. I’ll push the Prometheus changes to the next.

On Mon, 30 Apr 2018 at 08:04, Tasdik Rahman notifications@github.com wrote:

Yep- we’ll probably do a release soon. I’d like to get better Prometheus metrics exported first (which should be quite quick) then do a release so perhaps within a few days/week?

Sounds good, thanks for the help! :)

— You are receiving this because you modified the open/close state.

Reply to this email directly, view it on GitHub https://github.com/uswitch/kiam/issues/46#issuecomment-385325304, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAEfusGPni9uOoZj6--IDgeg9wC5fkpks5ttrdrgaJpZM4SrrIo .