knative / serving

Kubernetes-based, scale-to-zero, request-driven compute
https://knative.dev/docs/serving/
Apache License 2.0
5.56k stars 1.16k forks source link

Is there any plan for more metrics support for KPA/HPA? #11051

Closed denny-lclin closed 2 years ago

denny-lclin commented 3 years ago

Ask your question here:

According to Custom metrics Docs , it only mentions that custom metrics 'allows users to configure concurrency based scaling when using the Horizontal Pod Autoscaler'. Does Knative have plans for more support for custom/external metrics usage? e.g Prometheus metrics can be used as 'External metrics' in HPA

markusthoemmes commented 3 years ago

The custom-metrics API is actually no longer supported by Knative. That docs page should be deleted.

As such, there are no plans to have more metrics currently.

Which ones would you have wanted to get? Any use-cases you can share?

khaeghar commented 3 years ago

Hi, I do have a case I'd like to share.

In a project that I'm currently working on, we would like to have a microservice that could scale both with rps and events (Similar to how KEDA works).

For the HTTP part, we use Knative Serving, but for the Event part we would need something like KEDA, because Knative Eventing transforms the events in HTTP requests, and we need to mantain the kafka channel until the microservice receives the event, without the HTTP POST Request as an intermediary.

It would be interesting to generate a metric that shows the number of messages that still have to be consumed. That could be done with a custom-metric, or with some kind of transformation from messages-to-read to rps (i.e., if there's still 100 messages to read, add more rps to the rps metric in order to scale accordingly).

In summary, we would need a way to scale a microservice both with rps and with event messages in Kafka (without having to transform them in HTTP Post request :D).

julz commented 3 years ago

hey @khaeghar - could you talk a bit more about why you need to maintain the Kafka channel until the microservice receives the event, rather than letting things work via eventing's conversion of the events to HTTP POSTs? (Trying to get a sense of the use case / feature enabled by something like that, and why the regular flow would not enable that use case; in other words, could you share specifically what the HTTP POST approach doesn't permit?)

khaeghar commented 3 years ago

Hi @julz ,

First problem we would encounter is the fact that some microservices has kafka controllers, and are subscribed to a queue. If we use the HTTP POST approach, we would have to migrate all the microservices that uses this controllers to a HTTP Controller, causing a big change in the code of all our applications.

Second problem (more of a concern) is the performance subject. We are not sure if HTTP POST would match Kafka in terms of concurrency and performance, at least in a event-driven architecture.

We're using Knative Serving for HTTP requests and it works like a charm. And we could have KEDA for event-only with kafka applications, which works great. But the goal would be to have a mix of them, both only HTTP and only event applications, and the hybrid ones, being able to scale both with HTTP and Events.

julz commented 3 years ago

FWIW I think I'd probably personally say if using Knative for scaling HTTP apps and KEDA for pull-based apps works for you you should carry on doing it.

The nice thing about Knative is that it lets you write HTTP for everything, and then invoke it that way or via Kafka etc (through Knative Eventing) and have it all work in the same way (since eventing effectively maps pull to push). Ack that there could potentially be a performance overhead to that conversion, but I'd personally say to measure that before worrying too much, and weigh it against the potential developer efficiency of only needing to write HTTP endpoints for everything.

If your app isn't based on HTTP, a lot of Knative's assumptions (and, tbh, benefits) break down, and that's OK - because it's all kubernetes, so it all works nicely together anyway. There's nothing wrong with using both Knative and KEDA together, and if the benefits of having a uniform HTTP-based abstraction for your developers don't outweigh performance or other concerns such that you need your app to directly pull from a queue, KEDA is a great choice for that part of your architecture.

khaeghar commented 3 years ago

Hi @julz ,

Exactly, we think that both solutions are excellent for a full http based application (Knative) and for a pull-based application (KEDA).

The problem arises when we have an application that uses both http and events. When we have Knative and KEDA together, they fight each other and scales up and down pods based on their metrics, separately, instead of having both of them into account. That's the problem and that's what we want to fix.

Let's say that, for example, we have a hybrid application. And such application receives for 30 min only http requests. Knative scales well this application, but KEDA understands that no event is coming into that app, shutting the pods down, because there's no workload.

That's the problem we have when using both KEDA and Knative for a hybrid application with both HTTP and Events, and that's why we would need this custom metric that shows the number of messages that still have to be consumed, the one that I talked about in the 24th of June.

julz commented 3 years ago

I think what I'm not following is why you can't just deploy separate containers for handling the HTTP requests (using knative) and handling queue messages (using KEDA)? These could both use the same image, if you liked. Can you talk more about why - if it does - the same container needs to do both (since the workload is being autoscaled, it seems like it can't be that you need to share memory between instances?)

khaeghar commented 3 years ago

@markusthoemmes We were wondering if there's a technical problem with adding more metrics support, specifically for this events metrics. The point would be to have everything in the same place, not having to manage the two systems at the same time (KEDA and Knative) and beign able to use Knative at its fullest. It' great to have this HTTP support, but we think that maybe events like Kafka or Prometheus metrics like @denny-lclin asks could be a very powerful tool of customization to escalate pods.

@julz Deploying separate containers makes us, again, having 2 instances when we could have just one for the affordable workload for that pod. I.e, we have both HTTP and Kafka events, and when both arrives at the service, we have to get 2 instances up, one for the HTTP and another for the Kafka event, and what could be handled by just one instances, needs 2, duplicating the resources for low workloads. So, the same container has to do both, that allows us to save in resources and in server's uptime.

julz commented 3 years ago

@khaeghar fwiw I'm still struggling a bit to see the problem here. Each of those two containers will be doing one specific thing, so if you scale them seperately you will not need to load Kafka libraries in the HTTP container, or HTTP libraries in the Kafka client container. That means each container will be less complex, need less memory, have less cpu contention, and since they'll each be doing less work, you can set their cpu/mem request and limits lower. Both KEDA and Knative will then scale those containers up and down from zero when idle and under load (and the load on each container will be less, so they'll scale up less, meaning once you get above 1 instance the numbers should be ~the same either way, just with better matched containers).

As a worked example to show the mental model (and please let me know if I'm missing something in your use case): consider the case where you have N HTTP requests and M Kafka requests to process. In order to process these requests you will need ceil(H N + K M) containers (where H and M are the constant factors of work a HTTP/Kafka request, respectively, require). On the other hand, lets say you have the same N and M requests, but you scale independently with Knative and KEDA. You will scale to ceil(H N) HTTP containers via Knative, and ceil(K M) Kafka containers via KEDA. This is obviously the same number except for rounding errors. However, with big containers that do both each container takes more memory and cpu (obviously due to the constant overhead of running a server / listening to Kafka etc), does more context switching, has more failure cases, and is more complex, and with big containers the HTTP Gateway has to manage and route to H N + K M containers instead of H N, which is less efficient, and similarly Kafka needs to keep track of H N + K M clients instead of K M clients, which again is overhead for ~no gain.

The only case I can think of where the above logic doesn't hold is what I describe as rounding errors above: when you don't have enough HTTP Requests or Kafka Events to fill up a single instance when scaled from zero so we have to round up to 1 instance. I'd suggest those use cases are by definition pretty low traffic use cases where you're not using a lot of resources in the first place so there's not a lot on the table to be saved, and that in many/most cases the answer is just to lower the mem/cpu requests/limits for your containers (safe in the knowledge that if requests increase, both Knative and KEDA will autoscale those containers). Yes in this case you might end up with 2 containers instead of 1 in some edge cases, but each container will be much smaller, and in general you pay for mem/cpu use, not number of containers anyway?

khaeghar commented 3 years ago

@julz Two containers with the same image has the same size and libraries. We don't want to separate the actual application in two, since it adds manage complexity (basically we would have another app to manage, and we would be separating the app in two just because of how we scale it, when it should be innocuous for them). If we had two separate apps it could work, but that's not our case.

Since the image size and memory are the same, having two of them, even with limited resources, is still a waste of those resources. Besides the app, there's an architecture underneath that process many things (like logs, traces, monitorization, etc) and that it's not precisely light, neither in terms of size or cpu needs.

And as you say, this is perfect when you don't have enough HTTP requests or Kafka Events to fill up a new instance. And yes, in that case we would end up with 2 containers instead of one, but the big container size is like, 1,5, and the two of them together are 2, since, like I said, we have more than "just the app" and we don't want to split them just because of how we scale.

But still, I don't know if just adding more metrics is a difficulty of some kind or impossible. I mean, maybe we could use more metrics (not just kafka events) or more info from outside sources to scale (like a metric that tells us that we could need more instances in a near future, kind of "being ready for the upcoming workload").

Sorry if I do not explain myself clearly enough, I can explain again anything we need to have this case clear. And thanks for the time invested in this 😄

julz commented 3 years ago

(FWIW if you add a simple flag/env variable to your image, you can avoid starting the HTTP server when it's being deployed to read Kafka events, and avoid starting the Kafka client when it's being deployed to serve HTTP. Then each container will not need to load in to memory the HTTP/Kafka libraries, will not need to listen on HTTP/answer Kafka requests, and you will be able to much more accurately tune the memory/cpu of both, as well as the scaling parameters since the workloads will be much simpler and more heterogenous. And even if you don't do this and run the http server/kafka client unnecessarily, as soon as you have enough load to exceed one instance, the usage will be the same -- it's just an extra, quite easy, optimisation that keeping these independent opens up)

To put this another way, it appears that you're trying to maximally utilise a single large container. With Knative/KEDA, since you have autoscaling, why not instead start with smaller containers - so avoiding the problem of wasted resource when the container is scaled up by not making it too big in the first place - and have them scale out more as needed? My instinct is this way will save you a lot of money/resources vs trying to utilise a too-big-in-the-first-place container, and be a lot easier to manage (apart from anything else I think you'd find it'll be much easier to tune the autoscaling of a HTTP thing and a Kafka thing independently than to try to set up one set of scaling parameters that work nicely for both quite different workloads with the same set of parameters).

github-actions[bot] commented 2 years ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.