Closed greghaynes closed 3 years ago
if all requests to revision1 were due to a revision specific URL and not to the KnService's URL, will the auto-scaler be able to tell? Because in that case revision2 won't get any traffic at all so keeping it's initial scale small would be more appropriate.
I like the idea in general. I wonder however what our preferred model is in this case. Wouldn‘t we advocate a gradual cutover via traffic splitting (which is one of the reasons we require traffic splitting in the first place).
Scaling both deployments to match traffic first is wasteful in terms of the resources needed.
Yes, I am intentionally trying to avoid saying "copy the scale of the previous revision" for this reason - its actually a calculation that would need to take routes in to effect, and TBH I don't have a great answer.
I think a good way to look at it is - I'd expect us to have a story for "I want to run my app in knative without doing advanced routing and be able to update it without disruption". I'd expect runLatest to exhibit this behavior given that is our "just DTRT" config option for routing. Maybe we can special case when a configuration is only pointed to by runlatest routes? For more advanced cases I think its fine if we say that users have to be more careful to shed load between revisions gracefully, but I think the out of the box experience for the simple case should work for updates.
It was part of my explorations document. So the blunt solution to scale up and then down, if all the traffic was arriving at the revision specific URL is a bit expensive, but the least likely to cause any disruptions due to traffic migrations.
If the deployment is run-lastest, that should be the way to go at least initially (and possibly guarded via feature flag) If the deployment is green-blue (or whatever traffic split there is) -- we can compute the expected traffic volume by aggregating over existing traffic volume and taking the appropriate percentage and scaling to that amount of traffic.
In any case, this feels like a very non trivial change, so probably we'll do it in post 1.0 world.
/cc @mattmoor
On Tue, Jul 9, 2019 at 11:12 AM Gregory Haynes notifications@github.com wrote:
I think a good way to look at it is - I'd expect us to have a story for "I want to run my app in knative without doing advanced routing and be able to update it without disruption". I'd expect runLatest to exhibit this behavior given that is our "just DTRT" config option for routing. Maybe we can special case when a configuration is only pointed to by runlatest routes?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/knative/serving/issues/4665?email_source=notifications&email_token=AAF2WX22H43GBM7FMMTHRQLP6TIIRA5CNFSM4H7H7HBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZRCMLI#issuecomment-509748781, or mute the thread https://github.com/notifications/unsubscribe-auth/AAF2WX6NLYXXDMSG5QXY32DP6TIIRANCNFSM4H7H7HBA .
/cc @mattmoor actually make prow do it:)
There are several motivations for wanting to establish this sort of association across Revisions in the same linear history; another that immediately comes to mind is applying VPA recommendations.
@greghaynes your last comment mostly sounds like you're after rollout orchestration here? As in: Slowly cutting over from one revision to the other without making the user do it?
I think our out of the box rollout orchestrating (route all traffic to latest revision) should work for the helloworld example without major disruption of requests on an update. IMO for more advanced usages its fine to tell the user they need to be aware of these kinds of issues and account for them, but we should have a (very) good story for "I have a stateless app I want to run on knative and always route traffic to the latest revision".
Yes that makes sense to me. I was mostly voicing out what wasn't apparent to me from the title: You actually want to be able to cut over between two revisions automatically without interruptions for the simple use-cases. The title (initial scale should be able to...) implied that that's somehow bound to the scale carrying over, even though you tried to avoid that 😅
Issues go stale after 90 days of inactivity.
Mark the issue as fresh by adding the comment /remove-lifecycle stale
.
Stale issues rot after an additional 30 days of inactivity and eventually close.
If this issue is safe to close now please do so by adding the comment /close
.
Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.
/lifecycle stale
Stale issues rot after 30 days of inactivity.
Mark the issue as fresh by adding the comment /remove-lifecycle rotten
.
Rotten issues close after an additional 30 days of inactivity.
If this issue is safe to close now please do so by adding the comment /close
.
Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.
/lifecycle rotten
/remove-lifecycle rotten
So today we label the Revisions that are routable with the Route's label. We key off of this label to establish whether or not to apply minScale
.
What if we used this label as a second virtual key for aggregating metrics? We would start to get a sense for the traffic flowing to a Route across the lifetime of many Revisions.
Now suppose that the Route also plumbed through the fraction of that traffic that was aimed at a particular Revision? Modulo latency differences we would know immediately the volume of traffic we're about to aim at the new Revision, and can make a scaling decision accordingly. [1]
This ofc has an ordering problem: It would be better to not shift the traffic until the new Revision was adequately prepared to handle it.
One thought (that I haven't fully thought thru) is: what if we treated {virtual key total}*{fraction of traffic}
as a sort of virtual minScale
annotation? If we prime the Revision this way (e.g. annotate before actually shifting traffic), and this causes the Revision to become non-Ready
(as unsatisfied minScale
does) then the Route's traffic shifting logic will hold until the Revision becomes Ready
... (I don't like the lack of positive hand-off w/ autoscaler, but 🤷♂ )
[1] - A corollary here is that this information would also provide us with exactly what we'd need to fractionally scale Revisions in a traffic split with KEDA. e.g. a 70/30 split could add a 0.7
multiplier on the queue-depth for that Revision, and 0.3
on the queue-depth of the other.
cc @vagababov @markusthoemmes to tell me how crazy I am 😂
Basically, as I mentioned before, I want us to do the following
pc := |current rev|
tc := current_target
nc := new_target
|new_rev| := tc*pc/nc
|new_panic| := true // -> to make sure we don't immediately scale down, while traffic is being shifted
What makes this super complex is the N:M connection of routes and revisions. Any revision can be subject to N routes at any time and it's non-trivial (in our current system) to find out which how much traffic will be pointing to the new revision.
This decision is only fairly simple if we only have a revision stamped out by a service that only has a latestRevision: true route. In this case, we could just infer the new scale from the revision right before the new one. In all other cases, all bets are kinda off and we'd need to construct a graph to collect data from all revisions that are subject to a route and then accumulate that to what the new revision would receive after a change.
Possible? Probably. But... I dunno.
It feels like the right mechanism for us is to do this is to add orchestration to our traffic splits to be able to make them smoothly roll over rather than hammering the new revision immediately.
Another option: It would be possible for us to implement a GradualRollout
CRD that mimics the rollout behavior of latestRevision: true
. Effectively:
apiVersion: serving.knative.dev/v1alpha1
kind: GradualRollout
spec:
foo: # Duck for extracting LatestReadyRevision from status
apiVersion: serving.knative.dev/v1
kind: Configuration
name: asdf
bar: # Duck for manipulating spec.traffic
apiVersion: serving.knative.dev/v1
kind: Route
name: asdf
params: # totally random knobs, don't attack me :)
increment: 3
frequency: 30s
...
Either foo
or bar
should default to the other.
So I was thinking about GradualRoll
out last night, starting to throw together a PoC, and thinking about three things, which aren't easy as specified above:
Honestly, the solution I liked best was to adopt the Route's traffic:
block syntax directly:
traffic:
- configurationName: foo
percent: 30
- configurationName: bar
percent: 30
- revisionName: bar
percent: 40
The basic idea is that revisionName
is respected as-is, and configurationName
rollouts are done gradually as specified in TBD knobs. How tag:
translates would be TBD.
The more I thought about this approach and played with prototyping it, the more I became convinced that this should just be a part of Route, where omitting things gets you what we have today. So suppose we add to Route:
spec:
rollout:
increment: 1
frequencySeconds: 5
traffic:
- configurationName: foo
percent: 100
This would progress something like this. Original:
spec:
rollout:
increment: 1
frequencySeconds: 5
traffic:
- configurationName: foo
percent: 100
status:
traffic:
- revisionName: foo-old
percent: 100
We create a new revision:
spec:
rollout:
increment: 1
frequencySeconds: 5
traffic:
- configurationName: foo
percent: 100
status:
traffic:
- revisionName: foo-old
percent: 100
- revisionName: foo-new
percent: 0
5 seconds after above:
spec:
rollout:
increment: 1
frequencySeconds: 5
traffic:
- configurationName: foo
percent: 100
status:
traffic:
- revisionName: foo-old
percent: 99
- revisionName: foo-new
percent: 1
The default rollout:
would be an increment
of 100
and a frequencySeconds
of 0
.
cc @tcnghia @dgerd
This has the same question as you noted above. What happens if I roll out one more revision while this one is in progress? Do we start cascading or jumping?
The more I thought about this approach and played with prototyping it, the more I became convinced that this should just be a part of Route, where omitting things gets you what we have today.
This doesn't seem to solve your original question of "How do you stop a rollout or rollback?"
Having spent very limited time thinking about this.
from
(where are we decreasing traffic from) and to
(where are we sending traffic to now) syntax.A stab at something different for the sake of discussion
spec:
route: my-route
rollout:
incrementPercent: 1
frequencySeconds: 30
metrics:
.... #TBD
from:
- revisionName: foo
to:
- configurationName: bar
Limitations:
traffic
blocks)from
currently represents (i.e. If revisionName foo has 30% traffic then configurationName is rolled out to 30%. You cannot rollout less or more. You could possibly add more in the API to accept less, but it complicates error handling)Pros:
This doesn't seem to solve your original question of "How do you stop a rollout or rollback?"
Sure it does, same as I would rollback today. I post a traffic
block with the revisionName
s I want serving traffic while I fix things. 😉
Is anyone interested in this for 0.14 work? :)
I’m wondering if this is actually two enhancements: one KPA enhancement, and one Route enhancement.
The way I imagine this would work is KPA computes
target = total instances * max-assignment(spec, status)
and Route computes
status.assignment = requests / current-capacity
This would require KPA to report the estimated current capacity in some way.
Result is that you point the Knative spec
where you want to go, but the system doesn’t steer outside its performance envelope.
I'm concerned about requiring additional policy objects to do a simple thing safely. It seems like the system should default to "safe", and steering outside that should be the thing that requires extra equipment to effect.
As a general observation, safety is harder to achieve with a "push" system than a "pull" system, where replicas ask for requests rather than receiving them.
Estimating capacity is doable and rough estimates can be used to improve behaviour of a push-based system. A good reference is this presentation by @stevegury.
But you're still going to face Kingman's formula, aka the VUT formula, which demonstrates that slowdowns are exponential as you approach 100% utilisation and that the point at which the slowdown curve inflects sharply upwards gets earlier if you have higher demand variability or higher variability of the underlying process.
In Skenario I modelled the delay caused by lots of requests landing simultaneously using Sakasegawa's approximation; to avoid my charts blowing up I arbitrarily clamped utilisation to 96%. Even then what you'd observe was that the early requests have a long wait for service (cold start time + slow processing while heavily contended), after which it would level out.
@evankanderson , that has been my suggestion all along. We already know the capacity via all routes by virtue of knowing how much traffic is going to the existing revision over all possible routes.
Now, if we inject that information into new autoscaler object on creation, we can provide initial scale and start in panic mode to ensure no scaledown happens.
The problem is to figure out which traffic is going to be migrated, i.e. from which of the revisions and, in theory what % of that revision (i.e. I have the revision is targeted via 2 routes but only one of them is going to be replaced by the new revision -- so we need to scale only to a fraction of the old revision).
That is simple in the run-latest model, but harder in others. Basically we need to diff the routes and infer the traffic that way...
So it seems like it's something that should compute in route controller first, before passing that to the revision/kpa controllers.
Do we need to figure out which traffic goes where, or simply use the max(spec, status)
information from the Route(s) contributing traffic to determine how much load could be assigned to a given Revision?
Note that this is not "optimal" from a resource-usage perspective, because you may end up committing resources to both the "source" and "destination" Revision when moving traffic between them, but I'd argue that overcommit while moving traffic to preserve SLO is better than undercommit to minimize resource usage.
Example, assuming capacity for 8 pods in a cluster:
Initial state: | Revision | spec percent | status percent | num pods | desired pods |
---|---|---|---|---|---|
A | 100 | 100 | 6 | 6 |
Request full rollout of B | Revision | spec percent | status percent | num pods | desired pods |
---|---|---|---|---|---|
A | 0 | 100 | 6 | 6 | |
B | 100 | 0 | 0 | 6 |
Initial pods come up | Revision | spec percent | status percent | num pods | desired pods |
---|---|---|---|---|---|
A | 0 | 100 | 6 | 6 | |
B | 100 | 0 | 2 | 6 |
Route reacts and drives traffic over | Revision | spec percent | status percent | num pods | desired pods |
---|---|---|---|---|---|
A | 0 | 66 | 6 | 6 | |
B | 100 | 33 | 2 | 6 |
KPA reacts and scales down A | Revision | spec percent | status percent | num pods | desired pods |
---|---|---|---|---|---|
A | 0 | 66 | 6 | 4 | |
B | 100 | 33 | 2 | 6 |
Kube tears down pods from A, allowing B to schedule | Revision | spec percent | status percent | num pods | desired pods |
---|---|---|---|---|---|
A | 0 | 66 | 4 | 4 | |
B | 100 | 33 | 4 | 6 |
Route rebalances based on pods available | Revision | spec percent | status percent | num pods | desired pods |
---|---|---|---|---|---|
A | 0 | 33 | 4 | 4 | |
B | 100 | 66 | 4 | 6 |
KPA scales A down again: | Revision | spec percent | status percent | num pods | desired pods |
---|---|---|---|---|---|
A | 0 | 33 | 4 | 2 | |
B | 100 | 66 | 4 | 6 |
After the next Kube + route rebalance, plus KPA reaction (I'm combining three steps here because the tables are getting old) | Revision | spec percent | status percent | num pods | desired pods |
---|---|---|---|---|---|
A | 0 | 0 | 2 | 0 | |
B | 100 | 100 | 6 | 6 |
And then Kube cleans up the last two pods from A.
Note that this will have the side effect in a large cluster of temporarily consuming a large amount of capacity during a "hot runLatest" deployment. It should be possible to use Pod limits on the namespace level to mitigate this effect.
Mostly along those lines, but:
RevATotalCapacity/RevBTargetCapacity
, since CC/Target/TU all can change from A
to B
.Which brings to the question:
How does route controller know how much of traffic of podA can be shifted to an instance of revision B?
With respect to 1, if we set KPA's target requested out of Kubernetes to:
requests * max(spec.assignment, status.assignment) / observed-requests-per-container)
I don't think we need to start in panic mode; we can start directly at what we think we need (i.e. 6 pods in this situation). Note that this requires deriving observed-requests-per-container
for revision B based on data from revision A until we have enough information about B's behavior.
Good point that ContainerConcurrency and targets can change from one Revision to another. I'd assumed a code rollout without changing those parameters in my example, but we might need to determine a scaling factor between A and B if any of the autoscaling parameters change between the two revisions.
If we don't start in panic mode in 2s (AS reconcile period) it'll suggest scale less than 6 and immediately apply that. I think we need it. Obviously if we trick observed-requests-per-container
to be a from revision A, we can avoid it. But I think starting in panic mode is more natural. But I guess that's debatable.
Issues go stale after 90 days of inactivity.
Mark the issue as fresh by adding the comment /remove-lifecycle stale
.
Stale issues rot after an additional 30 days of inactivity and eventually close.
If this issue is safe to close now please do so by adding the comment /close
.
Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.
/lifecycle stale
@vagababov should we keep this one alive?
/cc @julz
/remove-lifecycle stale /lifecycle frozen Yes we should keep this.
Actually /assign Let me see what I can do here
Moved to 0.18 as I believe @vagababov is actively working on this.
@vagababov is this done per your recent work?
95% Docs and more tests remain.
@vagababov For the remaining work are you still tackling it?
Otherwise are you able to write up some new issues for other folks to pick up?
/close
Done.
@vagababov: Closing this issue.
In what area(s)?
/area autoscale /kind proposal
Describe the feature
Assume I have a configuration with revision1 which is serving some QPS via a runlatest route requiring replicaCount > 1 of it's deployment. I then create revision2 for this configuration. All of the traffic for this route will be redirected to revision2 once 1 instance is ready and the autoscaler will not begin to scale up replicaCount until this point. This will cause a disruption of service, especially when replicaCount of revision1 >> 1, while the additional replicas are scaled up.
Our autoscaler should be smarter about initially scaling up a revision to match the incoming load if we have that information.