Proposal: Use Prometheus for scraping & storage

jimmidyson commented 8 years ago

Right now, Heapster performs all the tasks of scraping, caching, exporting to external storage, & REST API for retrieval.

The same metrics that Heapster retrieves via the /stats endpoint on the kubelet are also exposed for Prometheus to scrape via /metrics. How do you feel about Heapster becoming a query & abstraction layer on top of Prometheus to provide Kubernetes semantics to Prometheus timeseries?

Prometheus is also used in a number of other components: etcd & skydns being the two main core Kubernetes ones I can think of straight off.

I fairly recently added Kubernetes discovery to Prometheus so we have all container metrics ingested, as well as application level metrics (future work for Heapster) thanks to the wide range of language plugins & external metric exporters in the Prometheus ecosystem.

Prometheus has export capabilities the same as Heapster (albeit without a GCM sink atm but that should be easy enough to add in).

Alerting, sharding & federation is already built in to Prometheus.

I could see Heapster becoming a REST API to convert Kubernetes semantic queries to Prometheus queries, with Prometheus providing awesome collection & (configurable) short-term storage of metrics.

Comments?

vishh commented 8 years ago

The same metrics that Heapster retrieves via the /stats endpoint on the kubelet are also exposed for Prometheus to scrape via /metrics. How do you feel about Heapster becoming a query & abstraction layer on top of Prometheus to provide Kubernetes semantics to Prometheus timeseries?

As of now, I think prometheus can be one of the backends for heapster.

Prometheus is also used in a number of other components: etcd & skydns being the two main core Kubernetes ones I can think of straight off.

I assume you are referring to metrics being in prometheus format. AFAIK, that was done because prometheus client libs were the most expressive ones as of now.

I fairly recently added Kubernetes discovery to Prometheus so we have all container metrics ingested, as well as application level metrics (future work for Heapster) thanks to the wide range of language plugins & external metric exporters in the Prometheus ecosystem.

This is great. This will definitely be helpful.

Alerting, sharding & federation is already built in to Prometheus.

Do we have any scalability numbers on prometheus?

Prometheus will be a good solution for monitoring. But I'm not convinced that we should require all Kubernetes clusters to run prometheus. We need input from users here. We can send out a survey or discuss this in the weekly hangout. AFAIK, users love and care about their own monitoring systems and requiring them to run prometheus might not be ideal.

One more concern I have is that in the future I want to have heapster optionally store metrics in a crowd-shared database, and use that data for resource prediction purposes. I haven't gotten the time to flesh out this idea completely.

vishh commented 8 years ago

We want kubernetes to have minimal dependencies for bootstrapping purposes. We will need reliable, low latency access to node resource usage metrics and that is the reason why heapster collects and serves these metrics directly. Internally, we have never depended on timeseries DB for critical cluster functionalities like scheduling. In this regard, we can have prometheus serve as the source of non-critical data, which are not time-sensitive.

thucatebay commented 8 years ago

Since Prometheus has alerting capability, it'd make for a good out-of-the-box experience. However, as a cluster grows in terms of nodes and pods, it becomes a non-trivial task to operate and scale a metric store such as Prometheus, InfluxDB, Graphite, etc. At eBay we have our own monitoring and alerting system, which we're planning to use for Kubernetes. Heapster fits well in this model since all we have to do is to write a sink. How about making Prometheus the default metric store instead of InfluxDB?

jimmidyson commented 8 years ago

@thucatebay Thanks for the feedback! I was thinking of Prometheus in this scenario as a day store, similar to how Heapster operates now, with rules for aggregating metrics as they come in to keep storage & memory requirements low. This would keep the management of it simple but bring with it extra benefits of:

application level metrics (future work for heapster)
persistence (afaik if heapster pod dies you lose all stats which could affect autoscaling)
sharding & federation (future work for heapster)

The only difference for you would be to write an external storage plugin for Prometheus as opposed to a Heapster sink.

It would also mean that those places that don't have their own monitoring & alerting system as you do have would be able to expand the environment by adding in Prometheus alert manager if they wanted, but certainly not a requirement.

vishh commented 8 years ago

I agree that Prometheus is a good candidate for better out of the box monitoring experience. AFAIK there is at-least one scenario where we will not run Prometheus - Google Container Engine.

I'd like to split core-cluster functionalities from monitoring. Disabling monitoring using heapster is totally fine. But I don't think we can disable collection and processing of core-cluster metrics that are required to bootstrap the cluster functionalities like scheduling. Even in the case of auto-scaling, I'd imagine us wanting to use curated metrics.

On Tue, Oct 13, 2015 at 6:39 AM, Jimmi Dyson notifications@github.com wrote:

@thucatebay https://github.com/thucatebay Thanks for the feedback! I was thinking of Prometheus in this scenario as a day store, similar to how Heapster operates now, with rules for aggregating metrics as they come in to keep storage & memory requirements low. This would keep the management of it simple but bring with it extra benefits of:

application level metrics (future work for heapster)

persistence (afaik if heapster pod dies you lose all stats which could affect autoscaling)

sharding & federation (future work for heapster)

The only difference for you would be to write an external storage plugin for Prometheus as opposed to a Heapster sink.

It would also mean that those places that don't have their own monitoring & alerting system as you do have would be able to expand the environment by adding in Prometheus alert manager if they wanted, but certainly not a requirement.

— Reply to this email directly or view it on GitHub https://github.com/kubernetes/heapster/issues/645#issuecomment-147716927 .

spiffxp commented 8 years ago

@vishh I'm a bit confused, are you saying that currently the heapster addon's presence is required for kube-scheduler to be working properly?

vishh commented 8 years ago

@spiffxp: Moving forward heapster (standalone) will be run by default on all kubernetes clusters. It will be serving the metrics APIs which will be consumed by the scheduler, auto-scalers, etc. The term addon is a misnomer because other addons like dns are also required for default kubernetes functionalities.

spiffxp commented 8 years ago

@vishh yeah but how about today? is this required for proper functioning of v 1.0.x or 1.1.x?

vishh commented 8 years ago

It is not required for v1.0.x. It is required for beta features in v1.1.x.

On Tue, Oct 13, 2015 at 4:03 PM, Aaron Crickenberger < notifications@github.com> wrote:

@vishh https://github.com/vishh yeah but how about today? is this required for proper functioning of v 1.0.x or 1.1.x?

— Reply to this email directly or view it on GitHub https://github.com/kubernetes/heapster/issues/645#issuecomment-147878257 .

jayunit100 commented 8 years ago

There seems to be pretty close coupling to prometheus on the metrics front already. sorta seems like overkill to maintain a separate timeseries framework ? but i see both sides of the coin here.

jimmycuadra commented 7 years ago

What's the current state of this? I've read the vision document, but it's still not clear if there is or will be support for Prometheus as a sink for heapster. It's confusing that Prometheus has emerged as the go-to monitoring system for Kubernetes, especially given that it's also a member of the CNCF, and yet when you deploy the cluster monitoring addon for Kubernetes, it uses an InfluxDB sink for Heapster, plus Grafana for visualizations. This means that cluster operators who want metrics with a larger scope than Heapster is intended for must maintain two separate time series databases.

DirectXMan12 commented 7 years ago

We're currently transitioning away from Heapster as the defacto solution, as per the new monitoring vision in the community repo. One of the results of that will be an end-to-end setup with Prometheus that does not involve Heapster.

DirectXMan12 commented 7 years ago

(in light of that, I'm closing this issue)

davidkarlsen commented 7 years ago

@DirectXMan12 "We're currently transitioning away from Heapster as the defacto solution, as per the new monitoring vision in the community repo." - do you have any references /docs for that (I guess it's not the vision doc mentioned above since that one refers to heapster)

DirectXMan12 commented 7 years ago

https://github.com/kubernetes/community/blob/master/contributors/design-proposals/monitoring_architecture.md should be what you're looking for.

monotek commented 6 years ago

The links is dead :-( Any mirror available?

I'm currently trying to find out whats the standard / best practice monitoring solution which is used in Kubernetes.

I thought its Cadvisor + Prometheus. Then i've read about Heapster which seems to be dead regarding @davidkarlsen post.

I'm a bit confused now. Where to start?

spiffxp commented 6 years ago

try https://github.com/kubernetes/community/blob/master/contributors/design-proposals/instrumentation/monitoring_architecture.md

https://github.com/kubernetes/community/pull/1010 shuffled around the contents of the design-proposals dir

kubernetes-retired / heapster

Proposal: Use Prometheus for scraping & storage #645