kubernetes-sigs / cluster-api

Home for Cluster API, a subproject of sig-cluster-lifecycle
https://cluster-api.sigs.k8s.io
Apache License 2.0
3.55k stars 1.3k forks source link

Inconsistent provider behaviour with API server ports #5517

Open randomvariable opened 2 years ago

randomvariable commented 2 years ago

What steps did you take and what happened: [A clear and concise description on how to REPRODUCE the bug.]

A common refrain I have heard from customers:

"As a cluster operator, I may be working in a multi-cloud environment, and also have a directive to always have https endpoints exposed on 443. In the Kubernetes context, this means clients should only ever connect to the API server on port 443."

In Cluster API, we have a number of places where the port can be changed:

In AWS, CAPA will use cluster.spec.clusterNetwork.APIServerPort to set the LB port and assume kube-apiserver continues to listen on 6443.

In Azure, CAPZ assumes that if cluster.spec.clusterNetwork.APIServerPort isn't 6443, that kube-apiserver is also listening on the different port. This is kind of necessary due to the way the Azure LBs work and the workarounds required for hairpin NAT.

Importantly, in bare-metal / on-prem environments like vSphere/Tinkerbell, we might be using kube-vip to make our control plane highly available. In this circumstance, there is no LB, and cluster.spec.clusterNetwork.APIServerPort is ignored.

What did you expect to happen:

One consistent place.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Enforcing any particular implied behaviour may break one or more providers as well as clusters on upgrade, so should not be done within a v1beta1 release IMO.

Also, not suggesting we change the default port, but make it easier to change the port and have a defined behaviour.

Environment:

/kind bug [One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

neolit123 commented 2 years ago

kubeadm picked the port 6443 as the default for secure serving because that is the default in the kube-apiserver:

--secure-port int     Default: 6443

https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/

this SO post shares some insights why 6443 might have been chosen: https://stackoverflow.com/a/44051315

...not to cause conflicts with the default Application Server domain

but as long as 443 is not occupied using it for the HTTPS seems fine.

if Cluster API wishes to switch to that i would give my tentative -1 (please feel free to ignore), because it will drift from the k8s/kubeadm default. kubeadm which tries to tightly follow the k8s defaults, will likely continue to remain on 6443 until the API server switches.

As a cluster operator, I may be working in a multi-cloud environment, and also have a directive to always have https endpoints exposed on 443. In the Kubernetes context, this means clients should only ever connect to the API server on port 443.

a minor concern here is that a specific deployment requirement (i.e. "we must use 443") should not dictate the port on core CAPI and all providers and instead perhaps that particular deployment should be configured.

randomvariable commented 2 years ago

Not suggesting we switch by default. Sorry, I was talking as a fictional cluster operator in a company. I'll clarify.

In fact, I think organisations who want 443 may be shooting themselves in the foot at a later date for exactly the reasons listed in the stack overflow comment.

devigned commented 2 years ago

In Azure, CAPZ assumes that if cluster.spec.clusterNetwork.APIServerPort isn't 6443, that kube-apiserver is also listening on the different port. This is kind of necessary due to the way the Azure LBs work and the workarounds required for hairpin NAT.

I don't think this needs to behave this way due to any particular Azure infra constraint. We should be able to solve for this. I'll create an issue to expose the API server LB port while keeping the API server backend port 6443, as done in CAPA.

randomvariable commented 2 years ago

/area control-plane

randomvariable commented 2 years ago

I don't think this needs to behave this way due to any particular Azure infra constraint.

Isn't there a case where because of hairpin NAT not existing, there's a hack to /etc/hosts where the LB control plane endpoint is repointed to 127.0.0.1 ?

fabriziopandini commented 2 years ago

/milestone v1.2 /kind api-change

yastij commented 2 years ago

Ideally the users need to be able to configure the local apiServer port through a single field. Today we can't really make the various bindPort in the init and join as part of the bootstrappers contract due to the fact that they're nested into APIs and structs that are kubeadm specific.

we probably also need to disambiguate between cluster.spec.clusterNetwork.APIServerPort and cluster.spec.ControlPlaneEndpoint.Port

An idea could be:

fabriziopandini commented 2 years ago

@yastij If I got this right we are addressing two problems:

If I look at the current API:

In other words, would it work to keep things as they are, improve doc, improve CABPK to make UX simpler, make providers behave consistently? The main problem I see is how to avoid this to have a negative impact on existing Clusters, I need to think a little bit more about this ...

fabriziopandini commented 2 years ago

/reopen pressed the wrong button, sorry

k8s-ci-robot commented 2 years ago

@fabriziopandini: Reopened this issue.

In response to [this](https://github.com/kubernetes-sigs/cluster-api/issues/5517#issuecomment-1031464332): >/reopen >pressed the wrong button, sorry Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
fabriziopandini commented 2 years ago

After thinking about avoiding negative impact, I think that a possible way out is to:

providers on their side must:

CecileRobertMichon commented 2 years ago

providers on their side must [...] start using cluster.spec.ControlPlaneEndpoint.Port for load balancers FrontEnds

what about the infrastructure provider contract the spec object must have the following fields defined: controlPlaneEndpoint - identifies the endpoint used to connect to the target’s cluster apiserver. ?

https://cluster-api.sigs.k8s.io/developer/architecture/controllers/cluster.html#infrastructure-provider

fabriziopandini commented 2 years ago

@yastij ^^ That's a good point. Taking a step back, we have a couple of user stories, in some cases not mutually exclusive:

sbueringer commented 2 years ago

In general it sounds good to me to have fields with a clear semantic that cover our relevant use cases on the Cluster object and then infer the corresponding values in CABPK (if they are set, otherwise use defaults).

yastij commented 2 years ago

/assign /lifecycle active

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

fabriziopandini commented 2 years ago

/lifecycle frozen

fabriziopandini commented 2 years ago

/triage accepted

fabriziopandini commented 1 year ago

(doing some cleanup on old issues without updates) /unassign @yastij /help

k8s-ci-robot commented 1 year ago

@fabriziopandini: This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to [this](https://github.com/kubernetes-sigs/cluster-api/issues/5517): >(doing some cleanup on old issues without updates) >/unassign @yastij >/help > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
k8s-triage-robot commented 7 months ago

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

fabriziopandini commented 6 months ago

/priority important-longterm

fabriziopandini commented 6 months ago

The current status, with different interpretations of those fields in different providers is not ideal.

We need a volounteer that does some research across providers and come up with some idea (eventually a proposal / an improvement to the contract) to improve the situation. Also the Flexible Managed Kubernetes Endpoints proposal should be taken into account

However, since there is almost no activity on the issue, we are decreasing priority

/priority backlog /triage accepted