kubernetes-sigs / apiserver-network-proxy

Apache License 2.0
381 stars 182 forks source link

Support dynamic number of proxy-servers #273

Open jkh52 opened 3 years ago

jkh52 commented 3 years ago

Feature Request: support clusters with dynamic number of proxy-servers.

Example use case: gracefully add a 2nd control plane node to a 1 control plane node cluster.

Current state:

Proxy Agent We need proxy-agent syncOnce() to stop short circuiting as aggressively.

Option 1: add new server RPC GetServerCount(); call at the top of syncOnce()

This seems logically the cleanest, but a big downside is an additional authentication (compared with Connect).

Option 2: have syncOnce() still try at some lower rate, even when connected to the last-seen server count.

Proxy Server

We could keep the --server-count flag but add a mutually exclusive --server-count-file to support a dynamic config value (avoid restarting the process).

Any other suggestions or example patterns for this side?

Expected Behavior

Add one master node:

Subtract one master node:

STATUS UPDATE (May 2023):

Proxy Agent has implemented Option 2; but at https://github.com/kubernetes-sigs/apiserver-network-proxy/issues/358 there is some discussion in favor of Option 1.

Proxy Server does not yet support dynamic count; there is not yet a design consensus. In a recent community meeting there was some discussion of using https://github.com/kubernetes/enhancements/issues/1965, but it was pointed out that kube-apiserver is not necessarily 1:1 with konnectivity-server. However, a similar implementation could be used (summary: introduce konnectivity-server leases with TTL; a given server can then count the un-expired leases to get a current server count).

jkh52 commented 3 years ago

Soliciting feedback.

/assign @cheftako /assign @caesarxuchao

jtherin commented 3 years ago

but add a mutually exclusive --server-count-file to support a dynamic config value (avoid restarting the process).

Why not a ConfigMap ?

jkh52 commented 3 years ago

Why not a ConfigMap ?

Assuming you mean: proxy-server use a ConfigMap (set by a cluster admin or bootstrapping infra) to learn accurate ServerCount. This is unfortunate / inconsistent because then the value is potentially visible to the cluster (even the proxy-agent) and we already have Connect() RPC protocol for the agent to learn server count.

zqzten commented 2 years ago

+1 for this feature

For server side, I wonder if we can find a way for servers to communicate with each other to dynamically get the replica count rather than depending on explicit external config. This would be super helpful for HPA.

janiskemper commented 2 years ago

+1 - would be very useful for clusters managed by Cluster API

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

zqzten commented 2 years ago

/remove-lifecycle rotten

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 1 year ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/apiserver-network-proxy/issues/273#issuecomment-1399504590): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
jkh52 commented 1 year ago

/lifecycle frozen

jkh52 commented 1 year ago

The current status:

Proxy Agent

Has implemented Option 2. As noted at https://github.com/kubernetes-sigs/apiserver-network-proxy/issues/358 there are associated server log errors and some opinions preferring Option 1 instead. (I'm open to revisiting; if we do, my main concerns are: A. backward compatibility for {new agents, old server}, and B. permissions (would GetServerCount also require agent token review, or widely visible?).

Proxy Server

Does not yet support dynamic count; there is no clear best approach.

In a recent community meeting there was some discussion of using https://github.com/kubernetes/enhancements/issues/1965, but it was pointed out that kube-apiserver is not necessarily 1:1 with konnectivity-server. However, a similar implementation could be used (introduce konnectivity-server leases with TTL).

jkh52 commented 7 months ago

This was discussed in cloud-provider + apiserver-network-proxy OSS sync this morning. See meeting notes For April 17, 2024.

In particular:

carreter commented 5 months ago

Hey all! I'm interning at Google this summer under @avrittrohwer. This issue will be my main project.

I'll be drawing up a design doc over the next couple of days! Are there any major considerations other than the ones mentioned so far here and in #358?

carreter commented 5 months ago

Design doc is ready! Here's the Google Doc.

The general idea is to have each proxy server publish a lease to the k8s apiserver and count the number of valid leases to determine the current server count, which it will then return to the agent via the gRPC Connect() call. A future iteration will have the agent directly read the leases from the apiserver and determine the count that way.

Feel free to drop any comments or suggestions you have on the doc!

carreter commented 4 months ago

At the most recent KNP meeting (7/10/24), @cheftako brought up that it will likely be necessary for us to roll out a way for KNP servers to manage their own leases regardless of whether we shorten the apiserver lease duration.

Would anyone be able to help with this?

carreter commented 2 months ago

Fixed by #643 !