Closed pweil- closed 2 years ago
Notes from discussion:
User logs into the cluster with kubectl kcp login
and some OIDC authn (e.g. Github)
if there is an existing kubectl login plugin we should just use it
User decides to migrate the app between Locations for Reasons Option 1: Admin is upgrading the cluster with downtime Option 2: Cluster is out of capacity Option 3: Cluster hard fails
Do these specific options matter for the purposes of the demo? vs just forcing a move from one cluster to another? afaik today we aren't even targeting allowing a 'User' (vs an admin) to choose a cluster to deploy to let alone migrate between - workload is just deployed to available compute (ala the 'Transparent' adjective of TCM).
Client has a multi-location security problem (needs a cert to access some external resource), THUS needs the https://github.com/kcp-dev/kcp/issues/416
Any context on why this is part of this flow?
Client has a multi-location security problem (needs a cert to access some external resource), THUS needs the https://github.com/kcp-dev/kcp/issues/416
Any context on why this is part of this flow?
It's a transition to the next demo afaict.
Client-facing traffic sees no interruption (?)
@jmprusi Thoughts as to what will be required to demo this? I know this can work with a cloud lb, but will that also be required for ci or is there a lighter-weight way to ensure this flow is tested once supported?
Do these specific options matter for the purposes of the demo? vs just forcing a move from one cluster to another? afaik today we aren't even targeting allowing a 'User' (vs an admin) to choose a cluster to deploy to let alone migrate between - workload is just deployed to available compute (ala the 'Transparent' adjective of TCM).
I believe the reason is to create a narrative for the demo. It will be used as justification for an admin to remove a the cluster from the user's workspace to force a migration.
I believe the reason is to create a narrative for the demo. It will be used as justification for an admin to remove a the cluster from the user's workspace to force a migration.
I guess I don't really get the point of forcing a specific narrative before we have actual implementation. Maybe it's supposed to be motivating, but it feels forced to me.
Any context on why this is part of this flow?
Here is the community call where these were defined if it helps to review. https://www.youtube.com/watch?v=_9ilcimFyec
I don't think we need to decide on a specific migration reason at this point. Just that it means we need to demo workload migration by removing the cluster running the demo workload.
"I need access to CUDA compute, so I move to a different data center" :)
Client-facing traffic sees no interruption (?)
@jmprusi Thoughts as to what will be required to demo this? I know this can work with a cloud lb, but will that also be required for ci or is there a lighter-weight way to ensure this flow is tested once supported?
So... let me braindump (sorry) here:
this gets really tricky when we add long-running connections, WebSockets or so... also a more advanced scenario would be to use the cluster gateways information to understand when the traffic has fully switched and then take down the workloads etc..
User decides to migrate the app between Locations for Reasons
If the reason is "the cluster is deleted", then this is effectively not demonstrating anything different from prototype 2 AFAIK. (That may be fine, for scoping down this prototype)
If the reason is "my pcluster scheduling constraints changed", then we need to design and implement scheduling constraints, which feels like a heavy lift. Same for designing and enforcing capacity as a scheduling constraint.
Maybe the best compromise is having the demo say "I've decided to manually move my app to europe-west
to be closer to customers", which demos manual cordoning, eviction, etc., which ingress can react to with a more graceful, slower cutover. This also means we don't have to design/implement triggering that slow move automatically in response to TBD scheduling constraint changes, and leave that for P4+.
It also means a future demo of "my app automatically detects it would get lower end-user latency by moving to europe-west
and triggers that itself" is a natural automation of a previous demo milestone, should we ever get to that point.
afaik the very concept of Locations is a topic of discussion. Maybe that should be the target for P3 - defining the mechanics of a cluster-abstraction concept (i.e. Location) at the workspace level that allows admins to hide the details of physical cluster association?
I'm still not clear what the Location abstraction implies wrt associating a given kcp namespace with compute capacity. I've been party to discussion suggesting that a given workspace could 'inherit' Locations from other workspaces and that a workspace would define a default Location for scheduling purposes.
It's less clear to me how a user would indicate their intent to prefer one location over another - is this 'scheduling constraints'? Without scheduling constraints, what mechanism would a user have for switching from one Location to another to satisfy P3? Or when we say 'user' do we really mean an 'administrator' that would have permission to remove a Location such that a namespace associated with it would be forced to be scheduled to another Location?
this gets really tricky when we add long-running connections, WebSockets or so... also a more advanced scenario would be to use the cluster gateways information to understand when the traffic has fully switched and then take down the workloads etc..
What's the best way to demo this then? In terms of stateless app it could be as simple as ngninx serving hello world... But again, what endpoint will we be targeting for demo that will ensure seamless handover between the application on one cluster to the application on another?
I'm still not clear what the Location abstraction implies wrt associating a given kcp namespace with compute capacity.
Same, I think it's under-designed so far. We don't really have a plan for limiting what the syncer can sync to a cluster, or bubbling up "syncer doesn't have capacity" to kcp. We would bubble up "workloads on the pcluster are unschedulable", whether that's pcluster-wide resource exhaustion or pcluster namespace quota limits. But so far we don't have anything that would limit a workspace's footprint on a pcluster, or even really where that enforcement happens (pcluster-scheduling-time? syncing time?)
It's less clear to me how a user would indicate their intent to prefer one location over another - is this 'scheduling constraints'?
Also under-designed at this time. At a high level users should say "put this workload where there's CUDA resources", or even more simply "put this workload in any N of M locations", but the language for that is still TBD. Clayton's talked about reusing node scheduling hints for pcluster scheduling, but I'm not convinced that's a good idea.
Without scheduling constraints, what mechanism would a user have for switching from one Location to another to satisfy P3? Or when we say 'user' do we really mean an 'administrator' that would have permission to remove a Location such that a namespace associated with it would be forced to be scheduled to another Location?
In the absence of a constraint language and automatic enforcement mechanism, we can at least demo "manually cordon us-east
(by annotating it)", instead of P2's "forcibly unplug us-east
", which would demo a more graceful rescheduling that allows the Ingress to move over without downtime.
What's the best way to demo this then? In terms of stateless app it could be as simple as ngninx serving hello world... But again, what endpoint will we be targeting for demo that will ensure seamless handover between the application on one cluster to the application on another?
This could demoed with a job pinging demo.example.com/hello
every 100ms, that doesn't see any 5XX errors while in another window we see the deployment shift replicas from Location A to B. WebSockets are harder, so let's just ignore them for now.
This could demoed with a job pinging demo.example.com/hello every 100ms, that doesn't see any 5XX errors while in another window we see the deployment shift replicas from Location A to B. WebSockets are harder, so let's just ignore them for now.
How is this going to work across multiple clusters? How do we enable transparent switching between applications in multiple clusters, except with some kind of intermediary (e.g. proxy).
To be clear, I'm looking for a way to validate this in CI as a precondition for having this be demoable, but likely a ci-testable option would work equally for demo.
How is this going to work across multiple clusters? How do we enable transparent switching between applications in multiple clusters, except with some kind of intermediary (e.g. proxy).
The steps for that are roughly what @jmprusi describes in https://github.com/kcp-dev/kcp/issues/415#issuecomment-1033967052
The pcluster being cordoned triggers the scheduler to duplicate the workload on some other cluster, including service+ingress, and the previous cluster's ingress proxies to the new one until some cutover.
It's quite a bit slower than just pulling the plug on the old cluster -- and might be slow enough that it means we can't practically cover it in CI -- but that's the price of zero downtime. I think we could even punt on total zero downtime if it's ~1s or something, and we can demonstrate that a more graceful reschedule than what P2 does today.
The pcluster being cordoned triggers the scheduler to duplicate the workload on some other cluster, including service+ingress, and the previous cluster's ingress proxies to the new one until some cutover.
It's quite a bit slower than just pulling the plug on the old cluster -- and might be slow enough that it means we can't practically cover it in CI -- but that's the price of zero downtime. I think we could even punt on total zero downtime if it's ~1s or something, and we can demonstrate that a more graceful reschedule than what P2 does today.
I'm more than a little surprised that local proxying would be an end-goal here, or that it would be a reasonable way of ensuring zero-downtime.
@smarterclayton Maybe you can chime in as to your expectations?
I'm more than a little surprised that local proxying would be an end-goal here, or that it would be a reasonable way of ensuring zero-downtime.
I don't think local proxying is the end-goal at all, just a step along the path that's achievable in the immediate timeline.
Right, let's keep in mind that we are exploring concepts that allow us to show a compelling vision of the value of something like KCP and enable others to poke at it for their use cases. We have to balance that need with what we think the long term engineering solutions may be.
Just a comment about:
- User lands in a default workspace.
Does this mean that the previous step kubectl kcp login <url>
would transparently perform the equivalent of a:
kubectl kcp create workspace <default workspace name> --use
so that the user is directly inside a personal workspace ?
For now no workspace is created nor listed by default for a user. Created and linked this follow-up issue https://github.com/kcp-dev/kcp/issues/488 to discuss this in more details.
cc @s-urbaniak for the authn aspect of this story and also to sync our view on "User lands in a default workspace". What does "land in" mean after a login command.
Unschedulable: true
) -- new workloads are not assigned to the clusterEvict: true
) -- existing workloads are rescheduled to another cluster, with some observed downtime@ncdc @robszumski
This issue title seems kinda of related to 2.1/2.2 in the transparent multi-cluster use case doc: https://docs.google.com/document/d/1LeYMt4I1No1W-tj6LCPuXE7kTSD-uggC8IRI6DVpHOM/edit?hl=en&forcehl=1#heading=h.kmn31tiyv4vs
Should we also be able do a simpler demo where something like a LogicalClusterPolicy is updated to run a stateless app in multiple clusters? Seems like being able to run an app redundantly is the first step you need before you can move it without downtime.
@chirino this has been scoped down to the updated set of demo steps now seen in the issue description. The net new features here are the cordoning and draining of a physical cluster.
Cluster placement/scheduling policies will come later, via separate issue(s).
This is done except for including it in the demo script.
Demo Objective
User has a multi-cluster placeable application that can move transparently
Demo Steps
Unschedulable: true
) -- new workloads are not assigned to the clusterEvictAfter: $now
) -- existing workloads are rescheduled to another cluster, with some observed downtimeAction Items
Nice to have