kubernetes / kops

Kubernetes Operations (kOps) - Production Grade k8s Installation, Upgrades and Management
https://kops.sigs.k8s.io/
Apache License 2.0
15.87k stars 4.64k forks source link

Include all APIserver addresses for nodeup config #16813

Closed rifelpet closed 3 weeks ago

rifelpet commented 3 weeks ago

With this plus https://github.com/kubernetes/kops/pull/16812 i'm able to get an ipv6 dns=none cluster to pass validation.

Before we were only including ipv4 addresses because those are the only CIDRs included in the cluster spec - ipv6 CIDRs are provided by AWS. This resulted in nodes failing to bootstrap because they couldn't reach the kops-controller endpoint:

Sep 06 01:23:11 i-0250853b64092f19d nodeup[1323]: W0906 01:23:11.157536 1323 main.go:133] got error running nodeup (will retry in 30s): failed to get node config from server: Post "https://172.20.5.248:3988/bootstrap": dial tcp 172.20.5.248:3988: connect: network is unreachable

Even though the ipv6 address and load balancer DNS name both work:

$ curl -6 -k  'https://[2600:1f14:1800:ab02:ba1a:4b60:b424:d3f3]:3988/bootstrap'
failed to verify token

curl -k https://api-peter-ipv6-k8s--k283sk-ef9b199ef8b93ba4.elb.us-west-2.amazonaws.com:3988/bootstrap
failed to verify token

With this change, the nodeup config userdata changes as such:

Will modify resources:
  LaunchTemplate/control-plane-us-west-2a.masters.peter-rifel-ipv6.k8s.local
    UserData
                            ...
                              APIServerIPs:
                              - 172.20.5.248
                            + - 2600:1f14:1800:ab02:ba1a:4b60:b424:d3f3
                            + - api-peter-ipv6-k8s--k283sk-ef9b199ef8b93ba4.elb.us-west-2.amazonaws.com
                              CloudProvider: aws
                              ClusterName: peter-ipv6.k8s.local
                            ...

  LaunchTemplate/nodes-us-west-2a.peter-rifel-ipv6.k8s.local
    UserData
                            ...
                              APIServerIPs:
                              - 172.20.5.248
                            + - 2600:1f14:1800:ab02:ba1a:4b60:b424:d3f3
                            + - api-peter-ipv6-k8s--k283sk-ef9b199ef8b93ba4.elb.us-west-2.amazonaws.com
                              CloudProvider: aws
                              ClusterName: peter-ipv6.k8s.local
                            ...
                                servers:
                                - https://172.20.5.248:3988/
                            +   - https://[2600:1f14:1800:ab02:ba1a:4b60:b424:d3f3]:3988/
                            +   - https://api-peter-ipv6-k8s--k283sk-ef9b199ef8b93ba4.elb.us-west-2.amazonaws.com:3988/
                              InstanceGroupName: nodes-us-west-2a
                              InstanceGroupRole: Node
                            ...

Must specify --yes to apply changes

Marking this as draft because its possible we could find a better approach, and because this may break other cluster configurations

k8s-ci-robot commented 3 weeks ago

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

rifelpet commented 3 weeks ago

From office hours: I will add back the filtering logic but always include any ipv6 addresses. It will still exclude the DNS names though.

rifelpet commented 3 weeks ago

/test pull-kops-e2e-cni-cilium-ipv6

rifelpet commented 3 weeks ago

the cluster validated and e2e tests started, but the prow job pod was interrupted

/test pull-kops-e2e-cni-cilium-ipv6

rifelpet commented 3 weeks ago

tests pass too 🎉 /cc @hakman

rifelpet commented 3 weeks ago

It looks like this is only needed for dns=none clusters. normal DNS clusters are already passing with my IMDS and controller-runtime changes: https://testgrid.k8s.io/kops-ipv6#kops-aws-cni-calico-ipv6-flatcar

k8s-ci-robot commented 3 weeks ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hakman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubernetes/kops/blob/master/OWNERS)~~ [hakman] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
hakman commented 3 weeks ago

Awesome work!