Open dfsdevops opened 1 week ago
This sounds like a problem in how Containerd merges configuration. Alternatively I have missed documentation about how overrides work. Using sed is not a recommended method as the EKS AMI can change their Containerd configuration at any time. We worked with them to get the imports working for this reason.
First step is to confirm that Containerd is omitting the original configuration when imports are used. After that we can move forward with figuring out how to solve this upstream if that is the case.
I certainly empathize with the fact that this was a carefully considered solution and trying to avoid issues with upstream changes. I agree it would be ideal to ensure containerd properly merges the TOML rather than hack around it with sed. It looks like they have an open issue for this https://github.com/containerd/containerd/issues/5837
Maybe I can use the crictl command to do a before/after to see if the actual running config difference is identifiable. I didn't know about that originally.
Thanks for linking the issue. I understand the problem a lot better now. So the problem is that the merging is not done at a deeper level than per section. Which explains why some of the original configuration is disappearing.
Could one option be to include the original configuration in the import file? This way we include all configuration again making sure that nothing disappears. Hopefully this is something that we can fix in Containerd v2.
We could update the docs if this temporary solution works for you.
Spegel version
v0.0.25
Kubernetes distribution
EKS
Kubernetes version
v1.29.8-eks-a737599
CNI
Amazon VPC CNI
Describe the bug
I have been using Karpenter (as well as terraform,) to update the user data field when launching new nodes with the following spec:
I'm basing this off these instructions https://github.com/spegel-org/spegel/blob/main/docs/COMPATIBILITY.md#eks
there seems to be an issue in that all of our Otel metrics for memory usage from the kubeletstats receiver are coming back as
0
rather than the number of bytes. I suspect that the patch is clobbering a containerd config dictionary that has an effect on kubelet metrics, as per this comment, a potential reason: https://github.com/awslabs/amazon-eks-ami/issues/1628#issuecomment-2324774464might be related to the
default_runtime_name
key thats being used the table here:My suggestion is that the documentation switch to using sed to override the template, as this seems to avoid the issue. I confirmed that after switching to this in my
userData:
field in Karpenter, I begin to see memory metrics in Otel again.