Closed tasdikrahman closed 6 years ago
strange, killing the kiam agent/server pods and letting the new ones come up fixed the issue. Any ideas on it?
Not off the top of my head- do you have the log data from the server processes? From the errors you forwarded before it sounds most likely an issue inside the server process.
On Thu, 15 Mar 2018 at 07:48, Tasdik Rahman notifications@github.com wrote:
strange, killing the kiam agent/server pods and letting the new ones come up fixed the issue. Any ideas on it?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/uswitch/kiam/issues/46#issuecomment-373289519, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAEfpCgm6glIOuZf0Xelo99eWHprVD1ks5tehy3gaJpZM4SrrIo .
Ah too bad, I didn't get the logs from the server agent before deleting them. :/
Yeah sorry, without the server logs its difficult to know what the problem is. I'm going to close this for now but please reopen with if it happens again with as much log data as you can capture please.
Thanks!
Hey thanks @pingles , will post here again if I face the issue again. Thanks for your time.
No problem, thanks for reporting an issue.
On Thu, 15 Mar 2018 at 10:40, Tasdik Rahman notifications@github.com wrote:
Hey thanks @pingles https://github.com/pingles , will post here again if I face the issue again. Thanks for your time.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/uswitch/kiam/issues/46#issuecomment-373333198, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAEfn-5_CKUcF9BzcOUQRSXxTNKKDxkks5tekUlgaJpZM4SrrIo .
Hey @pingles, whilst upgrading a cluster of ours. We faced the above issue again.
Logs from one of the client-agents
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:40Z"}
{"addr":"10.2.3.24:54024","headers":{},"level":"info","method":"GET","msg":"processed request","path":"/latest/meta-data/iam/security-credentials/","status":200,"time":"2018-03-21T14:23:40Z"}
{"addr":"10.2.3.24:54024","headers":{"Content-Type":["application/json"]},"level":"info","method":"GET","msg":"processed request","path":"/latest/meta-data/iam/security-credentials/my-iam-role","status":200,"time":"2018-03-21T14:23:40Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:40Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:40Z"}
{"addr":"10.1.54.111:37730","headers":{},"level":"info","method":"GET","msg":"processed request","path":"/ping","status":200,"time":"2018-03-21T14:23:40Z"}
{"addr":"10.2.3.39:35978","level":"error","method":"GET","msg":"error processing request: rpc error: code = Unknown desc = pod not found","path":"/latest/meta-data/iam/security-credentials/","status":500,"time":"2018-03-21T14:23:40Z"}
{"addr":"10.2.3.39:35978","headers":{"Content-Type":["text/plain; charset=utf-8"],"X-Content-Type-Options":["nosniff"]},"level":"info","method":"GET","msg":"processed request","path":"/latest/meta-data/iam/security-credentials/","status":500,"time":"2018-03-21T14:23:40Z"}
{"addr":"10.1.54.111:37770","headers":{},"level":"info","method":"GET","msg":"processed request","path":"/ping","status":200,"time":"2018-03-21T14:23:43Z"}
Logs from one of the server agents
ERROR: logging before flag.Parse: E0321 09:56:13.663622 1 reflector.go:315] github.com/uswitch/kiam/pkg/k8s/pod_cache.go:227: Failed to watch *v1.Pod: Get https://10.3.0.1:443/api/v1/pods?resourceVersion=26477282&timeoutSeconds=501&watch=true: dial tcp 10.3.0.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0321 09:56:13.663639 1 reflector.go:315] github.com/uswitch/kiam/pkg/k8s/namespace_cache.go:83: Failed to watch *v1.Namespace: Get https://10.3.0.1:443/api/v1/namespaces?resourceVersion=26255952&timeoutSeconds=503&watch=true: dial tcp 10.3.0.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0321 09:56:14.664819 1 reflector.go:205] github.com/uswitch/kiam/pkg/k8s/pod_cache.go:227: Failed to list *v1.Pod: Get https://10.3.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 10.3.0.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0321 09:56:14.665933 1 reflector.go:205] github.com/uswitch/kiam/pkg/k8s/namespace_cache.go:83: Failed to list *v1.Namespace: Get https://10.3.0.1:443/api/v1/namespaces?resourceVersion=0: dial tcp 10.3.0.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0321 09:56:15.665916 1 reflector.go:205] github.com/uswitch/kiam/pkg/k8s/pod_cache.go:227: Failed to list *v1.Pod: Get https://10.3.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 10.3.0.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0321 09:56:15.666963 1 reflector.go:205] github.com/uswitch/kiam/pkg/k8s/namespace_cache.go:83: Failed to list *v1.Namespace: Get https://10.3.0.1:443/api/v1/namespaces?resourceVersion=0: dial tcp 10.3.0.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0321 09:56:16.667094 1 reflector.go:205] github.com/uswitch/kiam/pkg/k8s/pod_cache.go:227: Failed to list *v1.Pod: Get https://10.3.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 10.3.0.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0321 09:56:16.667965 1 reflector.go:205] github.com/uswitch/kiam/pkg/k8s/namespace_cache.go:83: Failed to list *v1.Namespace: Get https://10.3.0.1:443/api/v1/namespaces?resourceVersion=0: dial tcp 10.3.0.1:443: getsockopt: connection refused
{"level":"info","msg":"found role","pod.iam.role":"my-iam-role","pod.ip":"10.2.2.193","time":"2018-03-21T09:56:17Z"}
{"level":"info","msg":"requesting credentials","pod.iam.role":"my-iam-role","time":"2018-03-21T09:56:17Z"}
{"level":"info","msg":"found role","pod.iam.role":"my-iam-role","pod.ip":"10.2.3.25","time":"2018-03-21T09:56:17Z"}
ERROR: logging before flag.Parse: E0321 09:56:17.668059 1 reflector.go:205] github.com/uswitch/kiam/pkg/k8s/pod_cache.go:227: Failed to list *v1.Pod: Get https://10.3.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 10.3.0.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0321 09:56:17.669509 1 reflector.go:205] github.com/uswitch/kiam/pkg/k8s/namespace_cache.go:83: Failed to list *v1.Namespace: Get https://10.3.0.1:443/api/v1/namespaces?resourceVersion=0: dial tcp 10.3.0.1:443: getsockopt: connection refused
...
...
{"credentials.access.key":"ASIAIKN7LBPV3EDBRAHA","credentials.expiration":"2018-03-21T10:06:38Z","credentials.role":"my-second-iam-role","generation.metadata":0,"level":"info","msg":"fetched credentials","pod.iam.role":"prod-harvester","pod.name":"pod1-76b6d9746-l8w2f","pod.namespace":"ns1","pod.status.ip":"10.2.11.145","pod.status.phase":"Running","resource.version":"26310469","time":"2018-03-21T09:56:18Z"}
{"credentials.access.key":"ASIAIPZZL4EO5OJJTDQA","credentials.expiration":"2018-03-21T10:07:37Z","credentials.role":"my-third-iam-role","generation.metadata":0,"level":"info","msg":"fetched credentials","pod.iam.role":"prod-euler","pod.name":"pod2-844d859d8-p4jnt","pod.namespace":"ns2","pod.status.ip":"10.2.10.13","pod.status.phase":"Running","resource.version":"22112994","time":"2018-03-21T09:56:18Z"}
ERROR: logging before flag.Parse: E0321 09:56:18.838270 1 runtime.go:66] Observed a panic: &runtime.TypeAssertionError{interfaceString:"interface {}", concreteString:"cache.DeletedFinalStateUnknown", assertedString:"*v1.Pod", missingMethod:""} (interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.Pod)
/go/src/github.com/uswitch/kiam/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
/go/src/github.com/uswitch/kiam/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/github.com/uswitch/kiam/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/asm_amd64.s:509
/usr/local/go/src/runtime/panic.go:491
/usr/local/go/src/runtime/iface.go:172
/go/src/github.com/uswitch/kiam/pkg/k8s/pod_cache.go:126
/go/src/github.com/uswitch/kiam/pkg/k8s/pod_cache.go:214
/go/src/github.com/uswitch/kiam/vendor/k8s.io/client-go/tools/cache/delta_fifo.go:451
/go/src/github.com/uswitch/kiam/vendor/k8s.io/client-go/tools/cache/controller.go:150
/go/src/github.com/uswitch/kiam/vendor/k8s.io/client-go/tools/cache/controller.go:124
/go/src/github.com/uswitch/kiam/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/go/src/github.com/uswitch/kiam/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/go/src/github.com/uswitch/kiam/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/go/src/github.com/uswitch/kiam/vendor/k8s.io/client-go/tools/cache/controller.go:124
/usr/local/go/src/runtime/asm_amd64.s:2337
Please let me know if you want something else from the logs. Thanks
Interesting, definitely looks like something's wrong there. I've reopened.
So kiam definitely assumes that the deltas delivered by the k8s client are only for *v1.Pod
.
Had a quick search for the error and this seems the same: https://github.com/kubernetes/kubernetes/commit/1c65d1df86e773ce9e1496163e724fa52a6dd864
Should be relatively easy to fix. I'll try and do it asap unless someone else beats me to it!
And more relevant docs: https://github.com/kubernetes/client-go/blob/master/tools/cache/delta_fifo.go#L656
Thanks for your time, appreciate it :)
I committed a fix earlier for this but I've also just changed again to remove some of the pod cache internals.
This should also remove the runtime.TypeAssertionError
for cache.DeletedFinalStateUnknown
.
The latest PR that addresses this (#51) also changes the server boot process so that the pod and namespace caches must Sync
before the gRPC listener starts. Hopefully both of those should mean the server process behaves better for you.
@tasdikrahman I'm going to close this issue for now (relating to the type error). If after updating Kiam (sorry, you'll need to use latest or the SHA for now) you see the erroneous behaviour again (pod not found errors) please re-open.
I'm going to close this issue for now (relating to the type error). If after updating Kiam (sorry, you'll need to use latest or the SHA for now) you see the erroneous behaviour again (pod not found errors) please re-open.
No problem at all. Will update on this issue if see the error again. Thanks a lot! Just for my sanity, was curious if a release is scheduled around 😀after 2.6
Yep- we’ll probably do a release soon. I’d like to get better Prometheus metrics exported first (which should be quite quick) then do a release so perhaps within a few days/week?
On Wed, 25 Apr 2018 at 08:49, Tasdik Rahman notifications@github.com wrote:
I'm going to close this issue for now (relating to the type error). If after updating Kiam (sorry, you'll need to use latest or the SHA for now) you see the erroneous behaviour again (pod not found errors) please re-open.
No problem at all. Will update on this issue if see the error again. Thanks a lot! Just for my sanity, was curious if a release is scheduled around 😀after 2.6
— You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub https://github.com/uswitch/kiam/issues/46#issuecomment-384192605, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAEfmgPKiRsrgHwk0njYGCIWHZCya2gks5tsCp4gaJpZM4SrrIo .
Yep- we’ll probably do a release soon. I’d like to get better Prometheus metrics exported first (which should be quite quick) then do a release so perhaps within a few days/week?
Sounds good, thanks for the help! :)
I’ll try and do a release today- that’ll pull in the error handling fix. I’ll push the Prometheus changes to the next.
On Mon, 30 Apr 2018 at 08:04, Tasdik Rahman notifications@github.com wrote:
Yep- we’ll probably do a release soon. I’d like to get better Prometheus metrics exported first (which should be quite quick) then do a release so perhaps within a few days/week?
Sounds good, thanks for the help! :)
— You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub https://github.com/uswitch/kiam/issues/46#issuecomment-385325304, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAEfusGPni9uOoZj6--IDgeg9wC5fkpks5ttrdrgaJpZM4SrrIo .
checked the kiam-agent logs on the node where the pod (which was to be assigned the iam role) was scheduled which look like
running
uswitch/kiam:v2.4
on both the agent and the server.The namespace where the pod is scheduled has the annotation
as stated in the docs
along with
in the trust relationships on the iam-role picked up the node where the pods are being scheduled.
Not sure if it's related, but we recently upgraded k8s from 1.8.4 to 1.8.9, but I guess that shouldn't be the problem.