google / cadvisor

Analyzes resource usage and performance characteristics of running containers.
Other
17.03k stars 2.31k forks source link

Support filtering monitored containers by container label #2380

Open stevebail opened 4 years ago

stevebail commented 4 years ago

I am working with kubelet cAdvisor that comes along with kubernetes cluster. I know that cAdvisor exposes container stats as Prometheus metrics but I am not very familiar on how to retrieve them manually using curl. What is the command to know 1) if cAdvisor is running on each node, 2) what version and 3) what port it is exposing? Are the cAdvisor metrics always served on the /metrics endpoint? Your help is greatly appreciated.

dashpole commented 4 years ago

:10255/metrics/cadvisor is where you can find them in recent releases.

stevebail commented 4 years ago

@dashpole @juliusv

Thanks David!

I now know 3 different ways to scrape cAdvisor metrics. See below. Which option is recommended for current release and future direction?

Option 1) Scrape the API server for each node in the cluster :--api-server-port--/api/v1/nodes//proxy/metrics/cadvisor Example: :8443/api/v1/nodes/node01/proxy/metrics/cadvisor

Option 2) Scrape the kubelet port on each node :--kubelet-port--/metrics/cadvisor Example: :10255/metrics/cadvisor

Option 3) Scrape each cAdvisor pod deployed as a daemonset configured with :--cadvisor-port--/metrics Examples: :8080/metrics

dashpole commented 4 years ago

Option 1 and option 2 are the same endpoint. One is just proxied by the API Server. Prefer (2) when possible because it is more direct. If you want to customize the set of metrics exposed by cAdvisor, you can run it yourself as a daemonset. If you just want the metrics in the kubelet's metrics/cadvisor endpoint, I would just use that to save on resource consumption.

stevebail commented 4 years ago

@dashpole I have one specific question about cAdvisor. I noticed that node_explorer supports "collectors" with the ability to enable/disable them at the source. Does cAdvisor supports collectors so that user can select the group of cAdvisor metrics to enable/disable? Do I need to install cAdvisor daemonset for this? Or will Prometheus get all cAdvisor metrics during a pull request? If so, is it possible to filter some of them in Prometheus server?

dashpole commented 4 years ago

You can use the --disable_metrics flag to specify the set of metrics you don't want.

stevebail commented 4 years ago

1) What is command to query the cAdvisor runtime flags? 2) I see the following options for --disable_metrics: 'disk', 'network', 'tcp', 'udp', 'sched', 'process’. Are any of those flags controlling the cAdvisor Prometheus metrics? I guess we get all cAdvisor Prometheus metrics by default?

dashpole commented 4 years ago
  1. It should be ./cadvisor --help
  2. All of those flags control cAdvisor prometheus metrics. You get only metrics that are not expensive by default. Most of the disabled metrics have a large number of metric streams for each container.
stevebail commented 4 years ago

I am at root directory on the node and cadvisor file cannot be found:

node01 $ /.cadvisor --help -bash: /.cadvisor: No such file or directory

dashpole commented 4 years ago

yeah, you will need to run that on the cAdvisor binary, which most likely isn't in the root directory of your node. Try:

docker run google/cadvisor --help from anywhere you have docker.

stevebail commented 4 years ago

The k8s cluster is already running and I see kubelet is reachable on port 10250:

node01 $ netstat -plant | grep kubelet tcp 0 0 127.0.0.1:38235 0.0.0.0: LISTEN 1369/kubelet tcp 0 0 127.0.0.1:10248 0.0.0.0: LISTEN 1369/kubelet tcp 0 0 172.17.0.42:53624 172.17.0.24:6443 ESTABLISHED 1369/kubelet tcp 0 0 172.17.0.42:53676 172.17.0.24:6443 ESTABLISHED 1369/kubelet tcp6 0 0 :::10250 :::* LISTEN 1369/kubelet

Do you think I still need to install cAdvisor binary on the node?

dashpole commented 4 years ago

See this comment above. You only need to run cAdvisor seperately if you need to customize the set of metrics. Also, I would not recommend manually installing it. I would use a DaemonSet instead, and use the docker image.

stevebail commented 4 years ago

Got it. Sorry for taking so much of your time. Last Q. If I don't want to customize metrics but I only want to see the current cAdvisor runtime flags in my k8s cluster environment running with kubelet on each node, how to do this?

dashpole commented 4 years ago

In my cluster, the command line flags for the kubelet are stored in /etc/default/kubelet, but that may change based your setup...

stevebail commented 4 years ago

@dashpole I have a use case where I just want to collect my container metrics and I don't have cluster admin rights. For instance. I am a cluster user and I just need cAdvisor metrics for containers in my namespace. So far cAdvisor scrapping options that I know of imply I have access to the node IP addresses or have sufficient RBAC privileges to scrape the cAdvisor endpoint via the API server. What are my options if RBAC is limited to my namespace and I just want cAdvisor metrics for containers in my namespace?

dashpole commented 4 years ago

We don't really support that use-case today. If you run cAdvisor as a daemonset, you would need the pod to be privileged for host filesystem access anyways, so there isn't really a good way to use it without elevated privileges.

stevebail commented 4 years ago

@dashpole Hi David I am thinking to request an enhancement to cAdvisor to support the ability to only collect container stats for containers that are in the same namespace as the cAdvisor deamonSet. I think it makes sense to support such use case since a user may be only interested in its container stats. Ok to proceed? How should we proceed?

dashpole commented 4 years ago

How would cAdvisor know the namespace of the container?

stevebail commented 4 years ago

I don't think it is very difficult but I am no expert. You tell me :) I think the automatic version of discovering the monitored namespace is not so easy (?). What about starting with a manual approach where the namespace is provided though configuration (say an an argument) of the cAdvisor container...

dashpole commented 4 years ago

namespace is a kubernetes construct. cAdvisor doesn't "understand" kubernetes constructs. Say we want to only collect metrics for containers in namespace foo. This is how it currently works:

  1. cAdvisor discovers the cgroup with id 5498743594325698432u85342k
  2. cAdvisor queries the container runtime for 5498743594325698432u85342k, and gets the container name, image, etc. The namespace is not included, since the container runtime doesn't know about kubernetes namespaces.
  3. ?
stevebail commented 4 years ago

In the docker/container runtime I see the following data: "io.kubernetes.pod.namespace": "foo"

dashpole commented 4 years ago

That is a container label. We probably don't want to rely on that, as it isn't an actual API. We could potentially have label filtering (e.g. only collect metrics for containers where label foo=bar).

stevebail commented 4 years ago

That would be great David!

mariadb-zdraganov commented 4 years ago

Is there actual support for passing match[] parameters to the /metrics endpoint?

stevebail commented 4 years ago

I don't think it currently does and this would be part of this proposed enhancement. You also mean the /metrics/cadvisor endpoint (e.g. cAdvisor in kubelet). This is for @dashpole to clarify.

dashpole commented 4 years ago

cAdvisor in the kubelet has its own labeling for cAdvisor metrics: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/server/server.go#L959

I think you should be able to use the store container labels and label whitelist flags here: https://github.com/google/cadvisor/blob/master/cmd/cadvisor.go#L71

stevebail commented 4 years ago

@dashpole Hi David. Is it possible to get an update on the filtering feature request i.e. the ability to collect cAdvisor metrics based on container label whitelist?

celian-garcia commented 3 years ago

@stevebail Did you try to whitelist on the prometheus scrape job ? I'm doing it with success :

  - job_name: cadvisor
    scrape_interval: 5s
    static_configs:
    - targets:
      - cadvisor:8080
    metric_relabel_configs:
    - source_labels: [ container_label_prometheus_io_scrape ]
      regex: True
      action: keep

Knowing that my whitelisted containers have the following label

labels:
  prometheus.io/scrape: true
stevebail commented 3 years ago

@celian-garcia Thank you for the suggestion. I am looking for a way to keep a container metric for certain containers and filter out the same metric from unwanted containers.

celian-garcia commented 3 years ago

Yeah I did the suggestion mainly for people like me who want to filter containers by label having the hand on the Prometheus configuration. If it is not your case, the solution won't fit your need.

I still think that the feature is worth it in cAdvisor.

zdraganov commented 8 months ago

@stevebail Did you try to whitelist on the prometheus scrape job ? I'm doing it with success :

  - job_name: cadvisor
    scrape_interval: 5s
    static_configs:
    - targets:
      - cadvisor:8080
    metric_relabel_configs:
    - source_labels: [ container_label_prometheus_io_scrape ]
      regex: True
      action: keep

Knowing that my whitelisted containers have the following label

labels:
  prometheus.io/scrape: true

The issue with this configuration is that the filtering is done in Prometheus, not query time, which can lead to significant bigger memory usage.