Closed james-callahan closed 1 month ago
This issue is currently awaiting triage.
If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
AWS CCM has been patching in both IPv6 and IPv4 IPs for quite some time. You just have to set NodeIPFamilies
to something like ipv6
and ipv4
.
See https://github.com/kubernetes/cloud-provider-aws/blob/master/pkg/providers/v1/aws.go#L1599
AWS CCM has been patching in both IPv6 and IPv4 IPs for quite some time. You just have to set
NodeIPFamilies
to something likeipv6
andipv4
.See https://github.com/kubernetes/cloud-provider-aws/blob/master/pkg/providers/v1/aws.go#L1599
I'm using v2, not v1.
I'm using v2, not v1.
As found in https://github.com/kubernetes/cloud-provider-aws/issues/677 I'm using v1 after all.
I gave this another attempt, setting the feature gate CloudDualStackNodeIPs=true
, and the cloud provider failed with e.g.:
I1017 02:43:36.977098 1 node_controller.go:431] Initializing node i-02de3f9b2d02feaa7 with cloud provider
E1017 02:43:37.264596 1 node_controller.go:240] error syncing 'i-02de3f9b2d02feaa7': failed to get node modifiers from cloud provider: provided node ip for node "i-02de3f9b2d02feaa7" is not valid: failed to get node address from cloud provider that matches ip: 2600:1f10:45a5:a918:5d99:c7b9:243:210f, requeuing
I realised that NodeIPFamilies
defaults to only ipv4
, so I added ipv6 to my cloudconfig:
[Global]
NodeIPFamilies=ipv4,ipv6
Which I can verify works via the log line:
I1017 02:58:51.340872 1 aws.go:1433] The following IP families will be added to nodes: [ipv4,ipv6]
The controller is now failing with e.g.:
I1017 03:04:58.888797 1 node_controller.go:431] Initializing node i-083e6ed22b10ddf06 with cloud provider
E1017 03:04:59.302680 1 node_controller.go:240] error syncing 'i-083e6ed22b10ddf06': failed to get node modifiers from cloud provider: provided node ip for node "i-083e6ed22b10ddf06" is not valid: failed to get node address from cloud provider that matches ip: 10.24.152.220, requeuing
I1017 03:04:59.302717 1 node_controller.go:431] Initializing node i-083e6ed22b10ddf06 with cloud provider
E1017 03:04:59.548721 1 node_controller.go:240] error syncing 'i-083e6ed22b10ddf06': failed to get node modifiers from cloud provider: provided node ip for node "i-083e6ed22b10ddf06" is not valid: failed to get node address from cloud provider that matches ip: 10.24.152.220, requeuing
I1017 03:05:01.368647 1 node_controller.go:431] Initializing node i-083e6ed22b10ddf06 with cloud provider
E1017 03:05:01.690156 1 node_controller.go:240] error syncing 'i-083e6ed22b10ddf06': failed to get node modifiers from cloud provider: provided node ip for node "i-083e6ed22b10ddf06" is not valid: failed to get node address from cloud provider that matches ip: 10.24.152.220, requeuing
I1017 03:05:05.698132 1 node_controller.go:431] Initializing node i-083e6ed22b10ddf06 with cloud provider
E1017 03:05:06.089973 1 node_controller.go:240] error syncing 'i-083e6ed22b10ddf06': failed to get node modifiers from cloud provider: provided node ip for node "i-083e6ed22b10ddf06" is not valid: failed to get node address from cloud provider that matches ip: 10.24.152.220, requeuing
I1017 03:05:14.785853 1 node_controller.go:431] Initializing node i-083e6ed22b10ddf06 with cloud provider
E1017 03:05:15.083704 1 node_controller.go:240] error syncing 'i-083e6ed22b10ddf06': failed to get node modifiers from cloud provider: provided node ip for node "i-083e6ed22b10ddf06" is not valid: failed to get node address from cloud provider that matches ip: 10.24.152.220, requeuing
I'm not sure why it's failing to get the node address, see aws ec2 describe-instances --instance-ids i-083e6ed22b10ddf06 | jq '.Reservations[].Instances[] | {PrivateIpAddress,Ipv6Address,NetworkInterfaces}'
{
"PrivateIpAddress": "10.24.152.220",
"Ipv6Address": "2600:1f10:45a5:a918:fd18:12af:1613:6c5d",
"NetworkInterfaces": [
{
"Association": {
"IpOwnerId": "amazon",
"PublicDnsName": "ec2-3-85-73-150.compute-1.amazonaws.com",
"PublicIp": "3.85.73.150"
},
"Attachment": {
"AttachTime": "2023-10-17T03:03:45+00:00",
"AttachmentId": "eni-attach-024b4933411c5f575",
"DeleteOnTermination": true,
"DeviceIndex": 0,
"Status": "attached",
"NetworkCardIndex": 0
},
"Description": "",
"Groups": [
{
"GroupName": "internal-talos-worker-general",
"GroupId": "sg-007b939554373cc2b"
}
],
"Ipv6Addresses": [
{
"Ipv6Address": "2600:1f10:45a5:a918:fd18:12af:1613:6c5d",
"IsPrimaryIpv6": false
}
],
"MacAddress": "0e:41:8b:af:7f:5f",
"NetworkInterfaceId": "eni-0aabf40c0e2dcd595",
"OwnerId": "799078726966",
"PrivateDnsName": "i-083e6ed22b10ddf06.ec2.internal",
"PrivateIpAddress": "10.24.152.220",
"PrivateIpAddresses": [
{
"Association": {
"IpOwnerId": "amazon",
"PublicDnsName": "ec2-3-85-73-150.compute-1.amazonaws.com",
"PublicIp": "3.85.73.150"
},
"Primary": true,
"PrivateDnsName": "i-083e6ed22b10ddf06.ec2.internal",
"PrivateIpAddress": "10.24.152.220"
}
],
"SourceDestCheck": true,
"Status": "in-use",
"SubnetId": "subnet-00c5e1b9c4baddcb3",
"VpcId": "vpc-060c91b3879fc8b83",
"InterfaceType": "interface"
}
]
}
From poking around the code and seeing your info above, it's not apparent to me what went wrong yet. Would it be convenient to add additional logging? Would be curious what addresses get returned by the cloud provider given that the IP it's looking for is very apparent.
Would it be convenient to add additional logging?
Not really for our configuration; would have to set up a whole custom build pipeline where we currently use the upstream image.
Would be curious what addresses get returned by the cloud provider given that the IP it's looking for is very apparent.
Yeah that's probably a good debug log to add. Might be good to add it in any case?
Not really for our configuration; would have to set up a whole custom build pipeline where we currently use the upstream image.
A repro would make it a lot easier to debug. Perhaps it could be setup via another mechanism, if it's an issue with the cloud provider.
Yeah that's probably a good debug log to add. Might be good to add it in any case?
Ya. There's not a lot of logging in the cloud provider, though some of this could make sense to add in kubernetes/kubernetes, and seems very reasonable to add some debug level logging for exactly this kind of thing.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
I would love it if someone could just add some more debug logging around this in the cloud provider. Then once there's another release I'd be able to share debug logs.
We face the same issue.
I created a cloud-config file and set the NodeIPFamilies
and I can see that it is in-use in the aws-cloud-controller-manager logs. I also had to add --feature-gates=CloudDualStackNodeIPs=true
to the aws-cloud-controller-manager
and kubelet
.
When I set --node-ip=<IPv6 address>,<IPv4 address>
to the kubelet then I receive log lines like this and the node was tainted with node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule
.
024-05-08T08:21:12.851616193Z E0508 08:21:12.851520 1 node_controller.go:240] error syncing 'i-08b4defa905155953.eu-west-1.compute.internal': failed to get node modifiers from cloud provider: provided node ip for node "i-08b4defa905155953.eu-west-1.compute.internal" is not valid: failed to get node address from cloud provider that matches ip: 2xxx:xxxx:xxxx:xxxx::c91a, requeuing
But I saw both the IPv6 and the IPv4 address in the InternalIP.
Then I set --node-ip=::
for the kubelet and it suddenly started to work but I saw only the IPv6 address in the InternalIP. Which is kinda expected based on the kubelet documentation.
This is our test cluster, if you tell me what logs/tests do you want then I can execute them.
I think I found what caused this.
I added a lot of klog.*
lines to the NodeAddressesByProviderID
function. This was interesting:
for _, family := range c.cfg.Global.NodeIPFamilies {
klog.Infof( "family: %v", family )
It generated this log line:
I0508 10:10:40.861561 881 aws.go:1676] family: ipv4,ipv6
So the configuration is parsed as a string ipv4,ipv6
instead of splitting the values into an array. I dug a little deeper and I found out how to set a multi-value configuration at https://pkg.go.dev/gopkg.in/gcfg.v1#example-ReadStringInto-Multivalue
After I changed the cloud-config.conf to this everything started to work.
[Global]
NodeIPFamilies=ipv4
NodeIPFamilies=ipv6
I recommend to include this in the documentation. It was a bit frustrating that I had to read the code as I did not find any documentation about how to construct the cloud-config file (I even started with a YAML first).
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
What would you like to be added:
I'd like to start using dualstack in our kubernetes cluster via the
CloudDualStackNodeIPs
feature gate. Trying to do so I get errors such as:Trying to debug the issue, I think it's because the code at https://github.com/kubernetes/cloud-provider-aws/blob/d0551093673e8c355db17249b8f069767c014748/pkg/providers/v2/instances.go#L216C46-L216C64 doesn't look at
Ipv6Addresses
. It only iterates over the IPv4 addresses inPrivateIpAddresses
.Why is this needed:
The EC2 api returns IPv6 and IPv4 addresses in different fields.
/kind feature