knative / serving

Kubernetes-based, scale-to-zero, request-driven compute
https://knative.dev/docs/serving/
Apache License 2.0
5.56k stars 1.16k forks source link

Make Queue Proxy Optional #7210

Closed julz closed 3 years ago

julz commented 4 years ago

Describe the feature

Make the Queue Proxy optional per-revision

Why?

Many functions/apps already use a single threaded runtime and are autoscaled on metrics that don't require the proxy (e.g. cpu). Making the Queue Proxy optional for these cases saves some memory and takes a hop out of the request path.

Longer why:

The queue proxy does 2 things:

  1. It enforces a concurrency level on the pod
  2. It reports metrics we use to autoscale

In return, it uses a small (but multiplied by the number of functions in the system, not trivial..) amount of memory and adds a hop to the request path.

The first feature - enforcing concurrency - is often built in to the runtime of the actual function: e.g. most Ruby servers are inherently single-threaded and will already queue requests up, so we only need to know the concurrency, not enforce it.

The second - emitting metrics - is not needed if the autoscaler is using metrics from other sources (for example if using the hpa autoscaler and autoscaling on cpu we don't need the queue proxy to emit metrics - it can get them straight from metrics-server).

markusthoemmes commented 4 years ago

I hear you, the queue-proxy does quite a bit more than that though:

It certainly warrants a discussion if we want to go down that route or not. Most notably it adds a permutation to our system where we currently can rely on the queue-proxy being always there.

julz commented 4 years ago

Yeah, I should say "primarily does 2 things" above, shouldn't I? :D.

All the above features are nice, but they're neither essential nor free when they're happening on the request path (also many - all? - of them can be delegated to the mesh if you have it enabled, and/or to a buildpack if you're using one).

For apps that want to optimise for processing a large number of requests with low latency (which seems like a pretty common use case for an event driven system), being able to remove a hop from the request path - at least optionally - is pretty important: if you've set those apps up with a minScale and are using HPA it's pretty sad to still be paying to count metrics you don't care about and to add features you may already have via your mesh/buildpack, or not need.

Also fwiw from experience with eg Cloud Foundry, even just adding envoy sidecars was a pretty painful change for operators who suddenly had an entirely non-trivial amount more memory and cpu usage in high density environments that economically want to run as close to capacity as possible.

If it doesn't sound completely implausible I'll create a proposal doc to discuss more?

julz commented 4 years ago

Re the advantage of being able to assume the queue-proxy will always be there, fwiw it looks like we already expose queueSidecarImage in the configmap, so arguably we can't actually rely on it doing any of the above tasks, or in fact any task at all, since a user can already override it?

markusthoemmes commented 4 years ago

Right, in reality we rely on many of the aspects though. We should review if the runtime contract sufficiently specifies these aspects to be able to pull the queue-proxy out of there.

Performance wise, I'm not sure. Depends on the use-case I guess. It'd be neat to get a benchmark that surfaces the queue-proxy becoming the bottleneck and/or causing enough of an overhead to warrant removing it.

In any case: Please bring it up at the autoscaling WG.

vagababov commented 4 years ago

So reading this, I see:

So the question I have is: what's left there of knative that you actually need? 😄

julz commented 4 years ago

@vagababov revisions, interoperability with eventing and being able to use the nice cli tooling :)

duglin commented 4 years ago

I'm starting to see uses for the QP as a wrapper for the user's container. Meaning, there's a particular runtime feature that we'd like to expose but we don't want all users to have to implement it in their container, but putting that logic into the QP works well. The graceful scale-down logic is a variant of that, but I'm working on a write-up of something that will show two other examples....soon.

Granted, you could claim that in cases where we don't need that kind of stuff then perhaps we don't need to inject the sidecar, but there is something to be said for consistency.

markusthoemmes commented 4 years ago

Is this still something anybody's pursuing actively? @julz you mentioned a couple of use-cases recently that'd even reinforce the usage of a queue-proxy.

julz commented 4 years ago

I go two ways on this. There are pretty strong reasons imo (especially in large environments) to want to avoid running an extra sidecar with every container. But clearly there are some use cases (freezing, tls termination) where a sidecar is useful. We're pretty blocked until StartupProbe can support subsecond intervals in upstream k8s tho.

I look at this as roughly like Istio. You can run in ingress-only mode with no sidecars, but then you don't get sidecar-only features like mtls. Especially once startupProbe is fixed I think there's a pretty reasonable (and analogous) use case for dropping QP unless the user wants it for a feature (including for setting TBC >= 0).

I've pretty much dropped this pending startupProbe being fixed and available tho, so seems legit to icebox for now.

vagababov commented 4 years ago

We're pretty blocked until StartupProbe can support subsecond intervals in upstream k8s tho. Unless we provide a library that does basically the same (though requires fanout to N languages).

Metrics for when Activator is not in the path is more of a concern from my point of view (but probably also solvable by a library).

github-actions[bot] commented 3 years ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

cheimu commented 3 years ago

Highly Agree! In my real case, for 1000 rps, queue_proxy will consume 500m CPU, which means in production, to guarantee stability of very busy and important services, the resources allocated to queue_proxy is at least 1 CPU, and this is only for 1000 rps. There are lots of services with very high rps, so as a sidecar that 1. enforces a concurrency level on the pod and 2. reports metrics we use to autoscale, the cost is pretty high.

Istio-ingressgateway can handle all of these situations.