Open yuzisun opened 2 years ago
Not sure if this exactly solves your issue, but it is possible to configure a gradual rollout.
I've seen a couple of issues come up around this topic (for example, #12551 #12859), so probably something we should look into.
unfortunately the current implementation of progressive rollout does not help when minReplica
is set. For example if you set minReplica to 10 with 1 GPU for each replica then you need 20 GPUs for the rollout as progressive rollout does not kick in until minReplica
requirement is fullfilled.
Checking the code here, the revision is marked active when pc.ready >= minReady
, so the traffic only starts to shift to the new revision after that. What we want is Kubernetes deployment's rolling update, traffic can start shifting as long as x% of minReplica is ready and at the same time old revision can be scaled down, this way we do not need 2x resources for every rollout.
@yuzisun do you have examples of how you're configuring the traffic blocks - or is it latestRevision: true
?
With progressive rollout, do we still need to wait for all the minReplicas to be ready to start migrating the traffic from old revision to new revision?
Revisions won't be Ready
until they reach their min-scale
. I do wonder if this can be adjusted by adding the initial scale annotation =1
which might mark the revision Ready
earlier - causing traffic to shift.
The other thing to note is we 'drain' old pods so they can handle any outstanding requests - we also wait in case there are delays in the network programming.
Can we do something like rolling update, when 20% is moved to the new revision we can then start scaling down the old revision accordingly, so that we do not need to require 2x resources during the rollout.
Our autoscaling system currently evaluates scale for revisions independently of each other. Coordination across revisions is currently not supported. Gradual rollout only works because we're manipulating traffic % at a higher level of abstraction.
What we want is Kubernetes deployment's rolling update, traffic can start shifting as long as x% of minReplica is ready and at the same time old revision can be scaled down, this way we do not need 2x resources for every rollout.
A legit question to ask if you're not benefiting from scale to zero - and your clusters don't have the capacity to scale up new pods - maybe you should be using Deployments since seems like you want a fixed set of replicas?
/triage needs-user-input
@dprotaso scale-to-zero is not the main reason we use Knative, it is the revision based rollout which is not supported via Deployment. With raw deployment you can't stage traffic and do canary deployment easily, currently KServe's canary rollout implementation is based on Knative revisions. Also it is not the fixed set of replicas, you can still leverage KPA with maxReplica > minReplica, minReplica is just to ensure stable performance for normal traffic load.
@dprotaso I have tested out initial scale annotation it does not mark the revision Ready earlier as the larger of initial scale and lower bound is chosen as the initial target scale.
This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen
. Mark the issue as
fresh by adding the comment /remove-lifecycle stale
.
/reopen
@yuzisun: Reopened this issue.
@yuzisun : Are you folks planning to contribute for this feature ?
@yuzisun : Are you folks planning to contribute for this feature ?
We are still working on a proposal for how to implement this.
/assign
@dprotaso scale-to-zero is not the main reason we use Knative, it is the revision based rollout which is not supported via Deployment. With raw deployment you can't stage traffic and do canary deployment easily, currently KServe's canary rollout implementation is based on Knative revisions. Also it is not the fixed set of replicas, you can still leverage KPA with maxReplica > minReplica, minReplica is just to ensure stable performance for normal traffic load.
stage traffic and do canary deployment can be achieved through progressive delivery tool sucha as Kruise Rollout https://openkruise.io/rollouts/user-manuals/strategy-canary-update
/triage-accepted
We discussed PR #14487 at the Oct 18th Serving WG meeting.
My suggestion there was to get the functionality working in the progressive rollout repo prior to us deciding on how to make sweeping changes in Serving. This comes with the understanding that this might require copying and pasting some existing code.
The path forward I'm envisioning is the following
I'll close out the extensions PR
Hi, do we have updates on this? Thanks!
Hey @wayzeng they're looking for feedback here
https://github.com/knative-extensions/serving-progressive-rollout
Thank you so much @dprotaso!! will give it a try.
Describe the feature
currently every time knative does rollout for a new revision it requires 2x resources when minReplica is set for a short time of period and then release the resource required for running the old revision after it is scaled down, this is a problem for platform that are enabled with resource quota and user has to budget 2x resources to run the service, it becomes a bigger problem when service is running on precious GPU hardware.
With progressive rollout, do we still need to wait for all the minReplicas to be ready to start migrating the traffic from old revision to new revision? Can we do something like rolling update, when 20% is moved to the new revision we can then start scaling down the old revision accordingly, so that we do not need to require 2x resources during the rollout.