Open jkh52 opened 3 years ago
Soliciting feedback.
/assign @cheftako /assign @caesarxuchao
but add a mutually exclusive --server-count-file to support a dynamic config value (avoid restarting the process).
Why not a ConfigMap ?
Why not a ConfigMap ?
Assuming you mean: proxy-server use a ConfigMap (set by a cluster admin or bootstrapping infra) to learn accurate ServerCount. This is unfortunate / inconsistent because then the value is potentially visible to the cluster (even the proxy-agent) and we already have Connect() RPC protocol for the agent to learn server count.
+1 for this feature
For server side, I wonder if we can find a way for servers to communicate with each other to dynamically get the replica count rather than depending on explicit external config. This would be super helpful for HPA.
+1 - would be very useful for clusters managed by Cluster API
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
/lifecycle frozen
The current status:
Proxy Agent
Has implemented Option 2. As noted at https://github.com/kubernetes-sigs/apiserver-network-proxy/issues/358 there are associated server log errors and some opinions preferring Option 1 instead. (I'm open to revisiting; if we do, my main concerns are: A. backward compatibility for {new agents, old server}, and B. permissions (would GetServerCount also require agent token review, or widely visible?).
Proxy Server
Does not yet support dynamic count; there is no clear best approach.
In a recent community meeting there was some discussion of using https://github.com/kubernetes/enhancements/issues/1965, but it was pointed out that kube-apiserver
is not necessarily 1:1 with konnectivity-server
. However, a similar implementation could be used (introduce konnectivity-server
leases with TTL).
This was discussed in cloud-provider + apiserver-network-proxy OSS sync this morning. See meeting notes For April 17, 2024.
In particular:
konnectivity-server
to read/write leases, it may be prudent to have an additional "minimum server count" flag.Hey all! I'm interning at Google this summer under @avrittrohwer. This issue will be my main project.
I'll be drawing up a design doc over the next couple of days! Are there any major considerations other than the ones mentioned so far here and in #358?
Design doc is ready! Here's the Google Doc.
The general idea is to have each proxy server publish a lease to the k8s apiserver and count the number of valid leases to determine the current server count, which it will then return to the agent via the gRPC Connect()
call. A future iteration will have the agent directly read the leases from the apiserver and determine the count that way.
Feel free to drop any comments or suggestions you have on the doc!
At the most recent KNP meeting (7/10/24), @cheftako brought up that it will likely be necessary for us to roll out a way for KNP servers to manage their own leases regardless of whether we shorten the apiserver lease duration.
Would anyone be able to help with this?
Fixed by #643 !
Feature Request: support clusters with dynamic number of proxy-servers.
Example use case: gracefully add a 2nd control plane node to a 1 control plane node cluster.
Current state:
Proxy Agent We need proxy-agent syncOnce() to stop short circuiting as aggressively.
Option 1: add new server RPC GetServerCount(); call at the top of syncOnce()
This seems logically the cleanest, but a big downside is an additional authentication (compared with Connect).
Option 2: have syncOnce() still try at some lower rate, even when connected to the last-seen server count.
Proxy Server
We could keep the --server-count flag but add a mutually exclusive --server-count-file to support a dynamic config value (avoid restarting the process).
Any other suggestions or example patterns for this side?
Expected Behavior
Add one master node:
Subtract one master node:
STATUS UPDATE (May 2023):
Proxy Agent has implemented Option 2; but at https://github.com/kubernetes-sigs/apiserver-network-proxy/issues/358 there is some discussion in favor of Option 1.
Proxy Server does not yet support dynamic count; there is not yet a design consensus. In a recent community meeting there was some discussion of using https://github.com/kubernetes/enhancements/issues/1965, but it was pointed out that kube-apiserver is not necessarily 1:1 with konnectivity-server. However, a similar implementation could be used (summary: introduce konnectivity-server leases with TTL; a given server can then count the un-expired leases to get a current server count).