google / cadvisor

Analyzes resource usage and performance characteristics of running containers.
Other
16.85k stars 2.31k forks source link

cadvisor kubernetes question #2451

Open gaure opened 4 years ago

gaure commented 4 years ago

Hi

I deployed Cadvisor as a Kubernetes daemon set. Because we wanted to see the metrics in Prometheus, we created a service Kubernetes object to expose the host local port to a cluster external accessible port. This setting is causing some of the k8s nodes to have Cadvior's metrics of other nodes. If I open the https://node1:30080/metrics URL, I can see pods/container metrics that belong to node 2. This problem is reflected in Grafana where the node 1 dashboard list pods/containers that are running in node 2 and so on with other nodes. There are some that match the metrics with the node, but not all of the nodes behave that way.

Is the above an issue with Cadvisor, Kubernetes and Prometheus? Or Am I deploying cadvisor incorrectly?

If I am deploying Cadvior incorrectly, What would be the correct way to deploy Cadvior as a K8s Daemonset, so I can use Prometheus to scrap the individual nodes pods/container metrics, per K8s cluster node?

Thanks in advance for any assistance you can provide me.

dashpole commented 4 years ago

I'd guess it is an issue with the service. What kind of service are you using?

gaure commented 4 years ago

Hey dashpole I follow the instructions here "https://github.com/google/cadvisor/tree/master/deploy/kubernetes", and add the below service.

----------- SERVICE ---------- apiVersion: v1 kind: Service metadata: name: cadvisor namespace: cadvisor spec: selector: app: cadvisor ports:

Thanks

gaure commented 4 years ago

Also, it is kind of confusing, this configuration because if I understand correctly cadvisor is embedded in the 'kubelet' service, if that instance is also using port 8080, which one the service is exposing, the daemonset one or the kubelet one.

dashpole commented 4 years ago

I believe that service should be taking cAdvisor's 8080 port, and using port 30080 on the host. I wonder if the nodeport service is doing some kind of load balancing...

gaure commented 4 years ago

Yes, I believe that too. Although as per the Kubernetes service's documentation (see extract below) the service's port "type" "NodePort" is local to the host and not to the cluster as the "LoadBalancer" type.

So when I configure Prometheus I used NodeIP:30080, and it is working, but the metrics are all mixed up. Not only on Prometheus but if I open a machine cadvisor URL http://Node 01:30080 I see metrics of a Pod that is running on Node 12. I know this because the Kubernetes "deployment" objects are using as "selector" the "hostname" and they (the k8s's "deployments") are named using the node name.

My best guess is that the internal Kubelet cadvisor is interfering with the cadvisor pod. Maybe some else already saw this behavior.

"...NodePort: Exposes the Service on each Node’s IP at a static port (the NodePort). A ClusterIP Service, to which the NodePort Service routes, is automatically created. You’ll be able to contact the NodePort Service, from outside the cluster, by requesting :..."

gaure commented 4 years ago

Hey David, what the machine_id is used for in cadvisor? I saw others post commenting about the container complaining about it to be missing. I can modify the deployment k8s object so the /etc/machine_id file be share with the container, which will fix the cadvisor start missing error, but is it the machine_id important to cadvisor to separate the metrics per individual host? Or it wouldn't matter if I fix this? Thanks

gaure commented 4 years ago

ignore my last comment the deployment object is mounting the roofs so it shouldn't complain about the machine_id file been missing. I got the error when I ran the container using docker because I am no mounting the rootfs.

dashpole commented 4 years ago

Yeah, fixing machine_id won't fix your problem.

I think we might need to switch from NodePort to HostPort. If I understand the api reference correctly:

type determines how the Service is exposed. Defaults to ClusterIP. Valid options are ExternalName, ClusterIP, NodePort, and LoadBalancer. "ExternalName" maps to the specified externalName. "ClusterIP" allocates a cluster-internal IP address for load-balancing to endpoints. Endpoints are determined by the selector or if that is not specified, by manual construction of an Endpoints object. If clusterIP is "None", no virtual IP is allocated and the endpoints are published as a set of endpoints rather than a stable IP. "NodePort" builds on ClusterIP and allocates a port on every node which routes to the clusterIP. "LoadBalancer" builds on NodePort and creates an external load-balancer (if supported in the current cloud) which routes to the clusterIP. More info: https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types

NodePort is actually just a ClusterIP that can be routed to from each node.

varunreddyvrj commented 2 years ago

Hi @gaure , facing the same issue, have you got any updates on this issue?

grissom1 commented 2 years ago

I think Nodeport mode expose that port on each node, so your node1:30080 is collecting metrics from where the cadvisor pod is running, in your case node 12 I guess. One possible solution is that your make sure deploy cadvisor on each node and create same amount of services for each pod. Like 30080 for node 1, 30081 for node 2 so on and so forth

riqueps commented 2 years ago

Hi, I faced the same problem. My workaround was create static cadvisor pod on each node by manifest file.

NathanNam commented 7 months ago

I faced the same issue.

VERSION=v0.42.0
kustomize build "https://github.com/google/cadvisor/deploy/kubernetes/base?ref=${VERSION}" | kubectl apply -f -

Is the instruction above working?