kcp-dev / kcp

Kubernetes-like control planes for form-factors and use-cases beyond Kubernetes and container workloads.
https://kcp.io
Apache License 2.0
2.33k stars 376 forks source link

Basic API Priority and Fairness for kcp #1271

Closed MikeSpreitzer closed 5 months ago

MikeSpreitzer commented 2 years ago

Demo Objective

Demo Steps

  1. Admin turns on the APF feature gate for the kcp server
  2. Admin creates 8 workspaces, sets each one's concurrency limit to 100
  3. User observes APF configuration API objects in each workspace
  4. Admin does a Prometheus scrape of the kcp server and observes APF's concurrency limit for each priority level in each workspace, verifies equal to expected
  5. For each of the first four workspaces, user(s) run 65 concurrent single-threaded looping clients making calls that go to the workload-low and 25 going to the workload-high priority level and do not imply a lot of system load.
  6. For the other four, users do the same but with twice as many clients per workspace.
  7. Via Prometheus scraping, and possibly Grafana visualizing, admin shows that (a) the kcp server is not buckling under the load, (b) the number of concurrently used seats does not go above the limit, for each (workspace, priority level) pair, and (c) the seat utilization stays close to 1 for the two intentionally loaded priority levels, in each workspace.

================ stretch goal ================

  1. For two of the workspaces, the clients offer only enough load to occupy half their allowed seats in each of those two priority levels.
  2. The verification shows that the clients get nearly what they ask for. There are overheads in each call, so the server-side measurements cannot equal client-side measurements. The challenge here is to control the difference well enough to have a meaningful check of the results.

Action Items

ncdc commented 2 years ago

@MikeSpreitzer just checking in - any updates to report on this?

MikeSpreitzer commented 2 years ago

No top-level progress yet, I was on vacation and before that focused on the Q2 edge PoC. I am fighting in https://github.com/kubernetes/kubernetes/pull/111222 and https://github.com/kubernetes/kubernetes/pull/111422 to move plumbing in a good direction.

-- Regards, Mike

From: Andy Goldstein @.> Reply-To: kcp-dev/kcp @.> Date: Tuesday, August 9, 2022 at 11:30 AM To: kcp-dev/kcp @.> Cc: Mike Spreitzer @.>, Mention @.***> Subject: [EXTERNAL] Re: [kcp-dev/kcp] Basic API Priority and Fairness for kcp (Issue #1271)

@MikeSpreitzer just checking in - any updates to report on this? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: <kcp-dev/kcp/issues/1271/1209534706 ‍ ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd

@MikeSpreitzerhttps://github.com/MikeSpreitzer just checking in - any updates to report on this?

— Reply to this email directly, view it on GitHubhttps://github.com/kcp-dev/kcp/issues/1271#issuecomment-1209534706, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADNCND3ZLIVDCRNPGC36PHDVYJ2RBANCNFSM5YYVHH4Q. You are receiving this because you were mentioned.Message ID: @.***>

pweil- commented 2 years ago

0.8 check in here. It looks like we're 2 weeks out for 0.8 closure and are waiting on upstream PRs. I am going to shift this to 0.9 but please correct me if you still plan on delivering something for 0.8.

ncdc commented 1 year ago

Moving to v0.10

cyang49 commented 1 year ago

I've been working on this and had on-going discussions with @MikeSpreitzer @ncdc @stevekuznetsov. The forks are https://github.com/cyang49/kcp/tree/apf-for-kcp https://github.com/cyang49/kcp-kubernetes/tree/apf-for-kcp

Current status: working on kcp cluster-specific work estimation functionality needed by APF logic. Added storage object count tracking mechanism and watch request tracking that work with kcp clusters. Now testing the logic.

The next step will be to work towards the demo described by Mike.

Missing functionalities:

ncdc commented 1 year ago

Please feel free to open draft PRs so we can start to add comments, if you'd like

ncdc commented 1 year ago

And thanks for the update!

cyang49 commented 1 year ago

I rebased my forks and found many modifications needed due to the kcp cluster client interface changes. Now the apf-for-kcp branches in the forks can build, but the APF Handle function panics when the cluster name of requests is the wildcard *. Andy suggests skipping APF for those requests for now. I'll continue testing and adding generation of default APF objects for each ClusterWorkspace

cyang49 commented 1 year ago

I looked into the code that ensures APF default resource objects, including both mandatory and suggested ones. We will need this mechanism to work for each logical cluster in KCP. This is another place where the lack of cluster-awareness may need major code changes might be needed.

Several issues need to be overcome:

Potentially we might be able to decouple the storage layer and the ensurer. We may be able to create/delete cluster-scoped ensurers in either storage-object-count tracker controller. @ncdc what do you think?

ncdc commented 1 year ago

@cyang49 let's find some time early this week to sync up and do some brainstorming

ncdc commented 1 year ago

/milestone clear