Closed Aaron-ML closed 1 year ago
It is hard to say without a full log, but I think:
It is hard to say without a full log, but I think:
- The connection reset can be ignored I guess? That is just as the connections are recreated probably.
- The thread blocked suggests that some blocking operation is taking longer than expected. In this case it looks like decoding/encoding some JSON. I have never seen it. I wonder if 100m CPU has something to do with it (i.e. is too small). But we do not seem to have anything in your files as default. So TBH I'm not sure what would a better value be. It could also suggests it is being fed some misformatted input. But that seems unlikely given the input should come from Kubernetes.
I can throw up the full log but the connection resets are pretty random on that interval. Could just be Azure API things.
I haven't seen any CPU throttling with that but it might be to small to catch on our monitoring. I'll increase it and see if it changes. Thanks for the insight
I removed the resource limits entirely and I'm still seeing the logs aside from the thread blocking one. Will continue to monitor.
Yeah, I think that makes sense - the resources removed the CPU bottleneck and the blocking operation now goes faster.
The connection resets are IMHO not related to CPU. That seems more like the connection to the Kube API is terminated. I think recreating the connections is not unusual for re-authentication etc. - I think that would happen with some randomness at < 10-minute intervals. But that should normally not trigger errors. So not sure why do you see them.
But I do not see these errors in my installations unfortunately, so no idea why they show up for you.
Yeah, I think that makes sense - the resources removed the CPU bottleneck and the blocking operation now goes faster.
The connection resets are IMHO not related to CPU. That seems more like the connection to the Kube API is terminated. I think recreating the connections is not unusual for re-authentication etc. - I think that would happen with some randomness at < 10-minute intervals. But that should normally not trigger errors. So not sure why do you see them.
But I do not see these errors in my installations unfortunately, so no idea why they show up for you.
Thanks for the insight, I'm going to close this issue and explore the API from the Azure side. I assume there's some difference in timings or proxy shenanigans with how they do things. Still works though which is all that matters!
Things to note:
Using AKS on Kubernetes v1.26.6
Helm chart version 1.0.0
Deployed using the helm chart with the following values: (we are precreating the namespace due to annotations for specific nodes) Certmanager successfully issues certificates and the pods start without issue. They do appear to work but wanted to validate that these errors are expected.
Log messages we are seeing:
Startup:
Random connection resets
Thread block logs
Watch termination failure logs
Just want to verify I'm not missing anything here, I don't see any errors in the kubernetes event log so I'm unsure what's happening here. Walking through the bindings everything seems to align. Happy to provide more info as needed!