Consider moving application metrics into own project

fabxc commented 8 years ago

Reading the README states that cAdvisor is about exposing metrics on running containers. It's scoped fairly well and there's no mention of application level metrics.

Yet there is actually support for application level metrics. This seems like an entirely orthogonal problem to me and I've doubts that it should live within the same repository/binary. Any system dealing with containers easily allows composing parts solving orthogonal problems.

What are everyones thoughts on splitting the collection of application levels metrics into its own component/repository? Especially since cAdvisor has a fairly large code base as it is and the issue tracker indicates that currently it's even understaffed for the aspect of container metrics. I believe keeping the scope within bounds also helps with the ownership problem.

That's obviously an invasive proposal, so I'm curious about any opinions on this.

@vishh @timstclair @matthiasr @grobie @jimmidyson @DirectXMan12

grobie commented 8 years ago

:+1:

DirectXMan12 commented 8 years ago

I'm torn (no pun intended), although I'm leaning towards agreeing with @fabxc and going for the split.

On the one hand, cAdvisor's support for application metrics is fairly clunky, and doesn't really work for Kubernetes' use case (for instance, it needs a Kubernetes HostPort to work, which is not really feasible to do at scale). Splitting out the application metrics would give us a chance to start fresh and build something that worked well for Kubernetes.

On the other hand, collecting application metrics may need awareness of the container system to properly collect (e.g. you may want to allow exposing metrics on localhost inside the container's net namespace, you may want to have the container runtime or a daemon collect all the metrics together and present them in one call, etc), so being inside cAdvisor makes sense, because it already has that knowledge.

matthiasr commented 8 years ago

I'm also for keeping application metrics separate, coming from a) the Prometheus tradition of small, focussed components and b) not using the App/Kubelet/Heapster/Google Cloud Monitoring pipeline and being wary of carrying too much of that around in environments where it doesn't translate.

you may want to have the container runtime or a daemon collect all the metrics together and present them in one call

Why would I want that? Why would I want it on a once-per-node level, not a once-per-pod or a once-per-cluster (which is basically Prometheus with the default Kubernetes config)?

DirectXMan12 commented 8 years ago

not using the App/Kubelet/Heapster/Google Cloud Monitoring pipeline and being wary of carrying too much of that around in environments where it doesn't translate.

Remember, when we talk application metrics here, we're not just talking about metrics used solely for monitoring purposes. The horizontal pod autoscaler in Kubernetes (and possibly other components in the future) can use custom metrics to scale on. While it might be desirable to only expose a small subset of metrics for such purposes, we still need to be able to send that subset through in some way that it is consumable by Kubernetes components (currently, that means exposing it through Heapster, which means Heapster needs to be able to collect it).

Why would I want that? Why would I want it on a once-per-node level, not a once-per-pod or a once-per-cluster (which is basically Prometheus with the default Kubernetes config)?

Whatever the collector is only has to make one HTTP call per node, greatly reducing the number of calls that the central collection agent needs to make. Additionally, having the collection agent one the node opens up more collection possibilities (e.g. I don't want to expose my metrics endpoint publicly to the cluster, just to the collection agent).

I think whatever we decide here, we have to keep in mind that fundamentally, as @fabxc has mentioned in the past, there are two concerns at work here: metrics for cluster operation (e.g. autoscaling in its various forms), and metrics for monitoring. We need a path for both concerns. It doesn't have to be the same path, but in the end, we need to make sure that we can still collect a few metrics for use by autoscaling, while not requiring all the metrics to go through that path.

timstclair commented 8 years ago

There are several discussions and proposals (e.g. https://github.com/kubernetes/heapster/blob/master/docs/proposals/push-metrics.md) for a separate application metrics pipeline in Kubernetes. I think the more relevant discussion here is how much value this feature adds for non-Kubernetes (a.k.a. standalone) users.

vishh commented 8 years ago

There are two scenarios as @timstclair mentioned:

cAdvisor as a standalone monitoring tool
cAdvisor as a metrics library that Kubelet uses

cAdvisor is better positioned for collecting node level container metrics. For example, it can discover containers dynamically & also acquire metadata from corresponding runtimes. This makes it an attractive solution for handling application level metrics as well. The real answer to this issue will come from users. If users find cAdvisor collecting application metrics to be useful, then why not support it as part of cAdvisor?

As for the issues that @DirectXMan12 mentioned, I find them to be bugs and not fundamental issues. For instance, cAdvisor can identify the IP of a container and use that instead of requiring host ports.

As for what Kubernetes should do by default, it is beyond cAdvisor. Metrics in kubernetes is a larger, more complex problem. So let's restrict this discussion to just cAdvisor as a standalone entity.

DirectXMan12 commented 8 years ago

cAdvisor is better positioned for collecting node level container metrics. For example, it can discover containers dynamically & also acquire metadata from corresponding runtimes. This makes it an attractive solution for handling application level metrics as well.

Agreed :+1:

As for the issues that @DirectXMan12 mentioned, I find them to be bugs and not fundamental issues.

I agree -- they're not fundamental issues with cAdvisor, but they do need to be solved. My argument there was mainly that a separate repo could take the burden of review/development off of the main cAdvisor maintainers, making experimenting with a new solution/fixes quicker, and freeing up the main cAdvisor team to work on the "core" cAdvisor metrics. However, if we can work out a way to fix the issues while having application metrics remain inside cAdvisor, I'd by happy with that.

As for what Kubernetes should do by default, it is beyond cAdvisor. Metrics in kubernetes is a larger, more complex problem. So let's restrict this discussion to just cAdvisor as a standalone entity.

There are several discussions and proposals (e.g. https://github.com/kubernetes/heapster/blob/master/docs/proposals/push-metrics.md) for a separate application metrics pipeline in Kubernetes. I think the more relevant discussion here is how much value this feature adds for non-Kubernetes (a.k.a. standalone) users.

I'm certainly aware of push metrics, but my (original) intent there was to supplement Kubelet's (and therefore cAdvisor's) container-level application metrics gathering, not necessarily supplant. For the reasons that @vishh outlines above, I still think on-node container-level application metrics gathering could be advantageous.

Any decision made here is going to have an effect on the future of container-level custom metrics in the metrics pipeline in Kubernetes, so I think we should bear that in mind while making decisions about application metrics in cAdvisor.

fabxc commented 8 years ago

Thanks for all the quick feedback everyone! Greatly appreciated.

Remember, when we talk application metrics here, we're not just talking about metrics used solely for monitoring purposes. The horizontal pod autoscaler in Kubernetes (and possibly other components in the future) can use custom metrics to scale on.

I understood that Heapster was serving that purpose.

Whatever the collector is only has to make one HTTP call per node, greatly reducing the number of calls that the central collection agent needs to make. Additionally, having the collection agent one the node opens up more collection possibilities (e.g. I don't want to expose my metrics endpoint publicly to the cluster, just to the collection agent).

That is one collection scenario. Regardless of my agreement with it (SPOF, number of requests vs response size, one fixed sampling rate for everything), cAdvisor is a generally useful tool for exposing container metrics. We tie an opinionated approach to an orthogonal problem to it? That said, I realize that some people want one collection agent per node.

We need a path for both concerns. It doesn't have to be the same path, but in the end, we need to make sure that we can still collect a few metrics for use by autoscaling, while not requiring all the metrics to go through that path.

Agreed.

This makes it an attractive solution for handling application level metrics as well.

This is based on the fact that cAdvisor is a local agent? Because a separate component would have the same benefits.

The real answer to this issue will come from users. If users find cAdvisor collecting application metrics to be useful, then why not support it as part of cAdvisor?

Basically for the reasons above. Exposition and collection are orthogonal problems. Why not? Way larger scope of code base, which comes with ownership problems (c.f. many issues, mostly uncommented, unlabelled, and unassigned), unnecessary coupling for releases and the iteration slowdown it comes with. It's two concerns in one process, which comes with all the problems that has. The same reasons apply for why we don't run several processes per container. The operational overhead is mostly negligible, and reduced for everyone who just wants one of the two.

As for what Kubernetes should do by default, it is beyond cAdvisor. Metrics in kubernetes is a larger, more complex problem. So let's restrict this discussion to just cAdvisor as a standalone entity.

Agreed. And for Kubernetes providing clearly scoped tools serving one purpose well is most beneficial.

DirectXMan12 commented 8 years ago

I understood that Heapster was serving that purpose.

Heapster has to get the metrics from somewhere. For the same reasons that it's nice to have a single collection agent per node, it's nice to not require Heapster to go and poll each pod, or even to have to know how to find which pods to talk to (the more things Heapster has to poll, the more time issues that we have).

google / cadvisor

Consider moving application metrics into own project #1420