josephburnett commented 6 years ago

Background

The current Revision model does not specify a minimum or maximum number of pods to constrain autoscaling. This can be dangerous because a failing downstream dependency can cause upstream load to increase, creating runaway scaling. An Operator-configured upper-bound on scaling is desirable to mitigate this risk.

Additionally, startup latency sensitive workloads may prefer the cost of continuously running 1 pod to the cost of multiple seconds of cold start + application start time. A minimum pod setting would allow disabling scale-to-zero on a per-Revision basis.

And finally, some systems may have out-of-band information about scaling events such as sports games, etc... which could be use to constraint autoscaling to a more optimal band. E.g. pre-scaling before running a large advertisement.

Proposal

I would like to add support for two KPA annotations (here) to constrain the min and max number of pods.

autoscaling.knative.dev/minReplicaCount
autoscaling.knative.dev/maxReplicaCount

The annotations would be in the user-facing domain (not autoscaling.internal.knative.dev) because they are intended to be set by the Operator to place bounds on resource consumption.

They would also apply to other autoscaling implementations specified by autoscaling.knative.dev/class (if the implementation supports min/max constraints.)

Original Proposal (rejected)

I would like to introduce min and max instances to the Knative serving model.

apiVersion: serving.knative.dev/v1alpha1
kind: Configuration
metadata:
  name: sample-configuration
  namespace: default
spec:
  revisionTemplate:
    metadata:
      labels:
        knative.dev/type: app
    spec:
      container:
        image: github.com/knative/docs/serving/samples/autoscale-go
      minInstances: 1
      maxInstances: 10

The default value of both minInstances and maxInstances would be 0. The default minInstances would allow scale-to-zero. The default maxInstances would allow unbounded scaling.

Open Questions

Should the min and max values be mutable by the Operator personal? Or should any changes to min and max require a new Revision?
If we allow Operator mutation of min and max, should we introduce another CRD to support RBAC? E.g. should we have a RevisionScaling CRD with serving state and min/max parameters? (See also https://github.com/knative/serving/issues/1655)

glyn commented 6 years ago

Would it be legitimate to set minInstances to a value greater than zero for configurations which run their own threads (possibly in addition to acting as servers) and which therefore need to be kept running to fulfil their intended purpose? Or is a better way of achieving that?

Also should the default value of maxInstances to indicate unbounded scaling be -1 since a possibly more obvious interpretation of 0 is that no instances are allowed?

markusthoemmes commented 6 years ago

Hmm. Given quick enough scale from 0-N it shouldn't really matter whether you're running an application-as-a-server or a function. Scale to 0 happens only in the absence of traffic for a certain time-period and in theory that should be viable for both use-cases imho.

There may be circumstances where an application needs lots of setup logic for example, where the user might want to set the min value not to 0 because setup is too expensive, but encouraging to embrace scale-to-0 as the default makes sense from my point of view.

To the maxInstances point: Does golang have the notion of Infinity? If so, could we use that as the default?

glyn commented 6 years ago

@markusthoemmes An example of what I had in mind is an application which emits events but which might rarely receive traffic. If it scaled to zero, it would stop "working". This might not be a legitimate use case for knative/serving, but I thought I'd ask.

I agree that scale to zero should be the default.

Does golang have the notion of Infinity?

Not for integers, but we could use math.MaxInt32, for example, as a default maximal value.

markusthoemmes commented 6 years ago

@glyn a that makes sense, you mean an application that doesn't receive traffic measured via knative's means? I'd say that's an edge-case and as you say doesn't play too nice with the knative model. We can still make it work by setting minInstances to > 0.

MaxInt does kinda make sense because a replicaset's replicas value is int32 anyway.

josephburnett commented 6 years ago

I prefer 0 to mean "no preference" for minInstances and maxInstances because that is Go's default value. 0 means the system should decide, similar to https://github.com/knative/serving/issues/1105.

But I think there are more important questsion:

Should we do this?
Where should we put it?
How should we control access to it?

mattmoor commented 6 years ago

This feels to me somewhat like the antithesis of "serverless", since this is a bound on the number of servers.

I think that I am somewhat resistant to the idea of making this a part of the core API, and would rather see it as a feature of autoscaler extensibility.

e.g. if I can plug in a "Bounded Autoscaler" and supply the bounds via annotations on the Configuration -> Revision -> "KPA", then the autoscaler itself can be sensitive to whatever extended configuration it wants.

In this world, we have the freedom to constrain the same problem in terms of other dimensions that are perhaps more amenable to serverless e.g. overall resource quotas (or some baseline capacity).

markusthoemmes commented 6 years ago

This feels to me somewhat like the antithesis of "serverless", since this is a bound on the number of servers.

@mattmoor I'm not sure I agree that this is the antithesis. At least the maxInstances bit of it is frequently asked for, at least in a functions, SingleConcurrency setting to be able to limit the maximum amount of things happening downstream.

EDIT: To clarify a bit: Yes, in a happy world, serverless does not need this because it scales with 0 latency to whichever amount of resources you need. The reality is different. So I actually agree, that its not in accordance to the serverless promise, but it is in accordance to the reality imho. At least today.

Further, also in this setting, it is often asked how it's possible to keep containers warm. There exist nasty workaround that utilize the vendor's cron events to try to keep containers warm at a relatively little cost.

Regarding it becoming part of the core API: It makes sense to put this is a non-default autoscaler first and then observe how many people actually use it? As you say that gives us a bit more freedom on experimenting and getting the "where do we put this" right, before it might eventually converge into the core API.

julz commented 6 years ago

Two cents: I think this fundamentally comes down to whether Knative is a set of primitives for building a PaaS (whether that be a Serverless/FaaS-style one, or a Cloud Foundry/Heroku-style one), or a set of primitives for building Serverless. There're a lot of apps that could happily use a lot of the nice stuff in Knative like revisions, builds, and even events but that don't start quickly (i.e. they're apps not functions) and need to keep a certain number of instances running.

We can take an opinionated stance here and say those apps aren't suitable for Knative and should be rewritten, or we can give those apps an easy on-ramp by making it easy to set a min and max. We can also - and I think that's what you're suggesting @mattmoor - decide we don't care enough about the use case to make it easy, but we're happy to make it possible, using a custom autoscaler.

My personal instinct is min and max are useful enough and low cost enough, (and there's enough precedent for them in enough systems) to want to add them. But it seems to me that if we make it sufficiently easy to do it via extensibility, then it may all shake out to the same thing anyway (and I'm definitely on board with experimenting first either way). I'm not totally clear though - probably just because I don't fully understand the extensibility approach yet - how that would look, and how easy it'd be for a user to organise for a custom autoscaler to see those annotations. Maybe it'd be helpful if you could give a concrete example for how this would look?

josephburnett commented 6 years ago

Knative is "serverless" to Developers, but not to Operators. @mattmoor, I agree with your point that min/max in Revision seems like the wrong place. In my mind, Revision is for Developers. What entity do we have for Operators? I think we need one. We can pretend to have an Operator entity by setting a bunch of annotations on Revision or KPA. But I think it would be better to be explicit.

josephburnett commented 6 years ago

It sounds like we all agree that a min/max configuration is useful. We are just working out where to put it, no?

tanzeeb commented 6 years ago

Instead of minInstances, could the autoscaler use the knative.dev/type label to infer the developer's intent and automatically pick the best value, eg. 1 for app, 0 for function?

mattmoor commented 6 years ago

These bounds should now be possible through a custom KPA implementation and annotations once this lands and assumptions about how metrics are pushed are removed from the Revision controller. My inclination would be to close this in favor of those issues. WDYT @josephburnett ?

josephburnett commented 6 years ago

Let's keep this open and just change the implementation to use annotations.

I still think that we should provide a mechanism to limit the KPA controller to a min and max number of pods.

josephburnett commented 6 years ago

@mattmoor, I've updated the issue description with the new annotation-based proposal. PTAL.

imikushin commented 6 years ago

/assign @imikushin

knative-prow-robot commented 6 years ago

@imikushin: GitHub didn't allow me to assign the following users: imikushin.

Note that only knative members and repo collaborators can be assigned. For more information please see the contributor guide

In response to [this](https://github.com/knative/serving/issues/1656#issuecomment-418872766): >/assign @imikushin Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

imikushin commented 6 years ago

Working on implementation anyway

knative / serving

Min and Max Pods #1656

Background

Proposal

Original Proposal (rejected)

Open Questions