Open pelzerim opened 4 months ago
This issue is currently awaiting triage.
If Karpenter contributors determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
Have you configured metadataOptions on your EC2nodeClass? This gives you an option to enable IPv6 endpoint for your instances. This option is disabled by default. Can you share what your EC2nodeClass looks like?
@jigisha620 Thanks for the prompt response! We do not have that option enabled:
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: ${name}
spec:
amiFamily: AL2
role: ${role}
subnetSelectorTerms:
%{ for subnet_id in subnet_ids }
- id: "${subnet_id}"
%{ endfor }
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: ${cluster_name}
tags:
karpenter.sh/discovery: ${cluster_name}
amiSelectorTerms:
- id: ${ami_id}
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeType: gp3
volumeSize: ${disk_size_gi}Gi
deleteOnTermination: true
encrypted: true
Will the change to the metadata service affect the instance kubelet configuration? The corresponding code seems to use ClusterDNS IP: https://github.com/aws/karpenter-provider-aws/blob/e8a345723c8db785bd07b8595c395edbdfb9255b/pkg/providers/amifamily/bootstrap/eksbootstrap.go#L122
Hi @pelzerim,
You are right. I misunderstood. Wondering if you have specified ClusterDNS via spec.kubeletConfiguration
since we rely on clusterDNS to pass --ip-family ipv6
. Can you also share your Karpenter controller logs from the time when this happened? Wondering if there was something that prevented Karpenter from discovering the clusterDNS.
@jigisha620 Sadly we could not find relevant logs (errors) form karpenter from around the time of the incident.
We however had kubernetes api throttling issues in the past with that cluster and i strongly suspect something in that direction. I've stumbled upon this FlowScheme
here and we indeed do not run karpenter in the kube-system
namespace. I've added the FlowScheme manually now and lets see if this goes away.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
Hello @pelzerim, I have the same problem here.
Description
Observed Behavior:
Karpenter created ipv4 nodes in a ipv6 EKS cluster.
The nodeclaims were unable to get into status
Initialized
:The cloud controller was unable to provide node information:
Inspecting the user data of the created ec2 instances reveals that the instances are missing these flags (which are there for older nodes):
Killing karpenter pods and removing all stuck nodeclaims by hand did resolve the issue.
This AWS EKS cluster is known to have performance issues with the control plane. We have already seen this issue. We are dealing with AWS support to get it fixed.
A bit of digging reveals that the decision to add the flags is dynamic.
Expected Behavior:
Karpenter should be able to provision IPV6 nodes even if the control plane is temporarily unavailable.
Having control over this setting directly would also be very useful?
Versions:
Karpenter CRD Chart Version: 0.37.0
Karpenter Chart version: 0.37.0
Kubernetes Version (
kubectl version
): v1.30.0-eks-036c24bAWS terraform-aws-modules/eks/aws//modules/karpenter version: 20.8.5
Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment