kedacore / keda

KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes
https://keda.sh
Apache License 2.0
8.01k stars 1.01k forks source link

Provide an OpenTelemetry scaler #2353

Open tomkerkhove opened 2 years ago

tomkerkhove commented 2 years ago

Proposal

OpenTelemetry allows applications/vendors to push metrics to a collector or integrate it's own exporters in the app.

KEDA should provide an OpenTelemetry scaler which is used as an exporter so we can pull metrics and scale accordingly.

Scaler Source

OpenTelemetry Metrics

Scaling Mechanics

Scale based on returned metrics.

Authentication Source

TBD

Anything else?

OpenTelemetry Metrics are still in beta but going GA by end of the year.

Go SDK: https://github.com/open-telemetry/opentelemetry-go

fira42073 commented 1 year ago

That makes sense. I couldn't figure out how to use it, while I was trying to resolve the issue, but now I understand it a little more.

I agree with you, it doesn't really make sense to get the data from otel collector.

On Thu, Jun 29, 2023, 11:43 AM Jorge Turrado Ferrero < @.***> wrote:

Please, ignore this: [image: image] https://user-images.githubusercontent.com/36899226/249733490-d4e15978-d7a2-45ee-9112-9f028d827b43.png I deleted my own comment with the wrong browser 🤦

I'm not sure if we can use OpenTelemetry as scaler because OpenTelemtry doesn't store the data, it's just a "producer and communication protocol", I mean, OpenTemetry defines how to generate and send the data, but it doesn't have any store where we can query the values. For achieving this, we (KEDA) should be a data store using OTLP (OpenTelemetry Protocol) for receiving the telemetry information. You can't just ask the collector about the information because collector isn't a backend store, it's a "routing pipe".

I wouldn't like to receive all the telemetry in KEDA to scale based on it because it'd be crazy and we would need to manage it securely, having access to ALL telemetry data in our side. I think that we can close this issue because it doesn't make sense (IMO). End users should use the proper backend storage scaler (loki, prometheus, elastic, etc) to scale based on them.

— Reply to this email directly, view it on GitHub https://github.com/kedacore/keda/issues/2353#issuecomment-1612723674, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJNTN64UTUIR2BVYO7AFG2LXNVE4LANCNFSM5I2E7K6A . You are receiving this because you were mentioned.Message ID: @.***>

tomkerkhove commented 1 year ago

I wouldn't like to receive all the telemetry in KEDA to scale based on it because it'd be crazy and we would need to manage it securely, having access to ALL telemetry data in our side. I think that we can close this issue because it doesn't make sense (IMO).

This is not the goal of this proposal, the proposal is to have a opentelemetry-collector scaler that queries the collector. I was under the impression that using the OLTP exporter that we query every now and then is just enough - I don't personally see what the problem is with that?

This is exactly why Exporters are available: "An exporter, which can be push or pull based, is how you send data to one or more backends/destinations. Exporters may support one or more data sources."

In this case, KEDA is a system that will pull the metrics when it has to and evaluate the metrics by using the built-in HTTP/gRPC exporter and evaluate the metrics which are in OTEL format.

This behaviour would be identical to what the metric-api scaler does and just scale based on the point-in-time metric value. If you want over-time queries, then another scaler is more suitable such as Prometheus but at least then they have a solid reason to use Prometheus.

I do not want to create an exporter as this is out of scope for us and agree there are enough backend technologies already that do this.

End users should use the proper backend storage scaler (loki, prometheus, elastic, etc) to scale based on them.

This is exactly what I do not want to do with KEDA because that means integrating and maintaining x systems while there is a vendor-neutral spec that can simplify this for us. Even if we had those scalers already, it is still beneficial to have the spec-based scaler to decouple the effective metric provider from the actual running technology to decouple things.

JorTurFer commented 1 year ago

This is not the goal of this proposal, the proposal is to have a opentelemetry-collector scaler that queries the collector

This is the problem, does the collector support being queried? I think that it's not supported because the collector isn't a data store, it's just a pipe. We use the collector in our apps and when I checked the docs, I didn't see anything similar that we can use, maybe something has changed during last months, but I don't think so

Apart from this, even though we could query them somehow, collector usually runs with > 1 replicas (for HA), and KEDA'd need to query all of them because each replica is independent (they are pipes, no stores) and aggregate them

tomkerkhove commented 1 year ago

This is not the goal of this proposal, the proposal is to have a opentelemetry-collector scaler that queries the collector

This is the problem, does the collector support being queried? I think that it's not supported because the collector isn't a data store, it's just a pipe.

Yes you can by using the exporters mentioned above.

Apart from this, even though we could query them somehow, collector usually runs with > 1 replicas (for HA), and KEDA'd need to query all of them because each replica is independent (they are pipes, no stores) and aggregate them

Are you sure or is that an assumption?

tomkerkhove commented 1 year ago

Q with OTEL collector folks: https://github.com/open-telemetry/opentelemetry-collector/discussions/8006

JorTurFer commented 1 year ago

Are you sure or is that an assumption?

I'm 100% sure that each replica is totally independent of the others. We have faced with this problem using prometheus receiver and AFAIK, they haven't found a good solution for it yet They don't share anything between them. You can run 1 instance without HA, you can run multiple instances with HA, you can run the collector as sidecar on each pod, there are multiple options, but all of them results on independent collector instances.

Assuming that we query the metrics somehow in pull mode (I don't think so, but maybe somehow), we have to manage the topology and aggregate the instances.

That's why we need the backend that aggregates the info, duplicates the info, etc

Q with OTEL collector folks: https://github.com/open-telemetry/opentelemetry-collector/discussions/8006

It could be the best option, yep :smile:

tomkerkhove commented 12 months ago

Are you sure or is that an assumption?

I'm 100% sure that each replica is totally independent of the others. We have faced with this problem using prometheus receiver and AFAIK, they haven't found a good solution for it yet They don't share anything between them. You can run 1 instance without HA, you can run multiple instances with HA, you can run the collector as sidecar on each pod, there are multiple options, but all of them results on independent collector instances.

Assuming that we query the metrics somehow in pull mode (I don't think so, but maybe somehow), we have to manage the topology and aggregate the instances.

That's why we need the backend that aggregates the info, duplicates the info, etc

An alternative approach is to use push instead, if that is the case. However, that might require separate component and probably as an add-on then.

JorTurFer commented 11 months ago

An alternative approach is to use push instead, if that is the case. However, that might require separate component and probably as an add-on then.

I think that this can be done with a Prometheus server, OTEL Collector pushing with Prometheus writer to a Prometheus server, and KEDA querying Prometheus. I see the power of OpenTelemetry, I use it indeed, but I still think that OpenTelemetry is not to be used as we want as scaler.

In this case I disagree with adding another component for it, there are already open source options that users can use for scaling with OpenTelemetry data, they can add Prometheus or Grafana Mimir for example. I think that we shouldn't store any kind of user data at all, and a component that stores the telemetry for scaling is storing user information.

Personally, I'd wait until they answer https://github.com/open-telemetry/opentelemetry-collector/discussions/8006 and based on their answer, continue or abandon this issue

tomkerkhove commented 10 months ago

I'm OK with waiting! However, I don't really agree on this:

In this case I disagree with adding another component for it, there are already open source options that users can use for scaling with OpenTelemetry data, they can add Prometheus or Grafana Mimir for example. I think that we shouldn't store any kind of user data at all, and a component that stores the telemetry for scaling is storing user information.

One does not simply "add Prometheus" as this increases the infrastructure you are running. If you don't have Prometheus already, you shouldn't have to add it to autoscale your apps IMO.

Edit - Actually nevermind, it's just exposing the Prometheus endpoint so we can scrape the collector directly instead of needing a Prometheus installation :)

JorTurFer commented 9 months ago

Edit - Actually nevermind, it's just exposing the Prometheus endpoint so we can scrape the collector directly instead of needing a Prometheus installation :)

This doesn't work, the collector exposes prometheus metrics in prometheus format, but the query api is totally different and can't work just using the metrics endpoint. We still would need a prometheus in that case