google / cadvisor

Analyzes resource usage and performance characteristics of running containers.
Other
16.6k stars 2.29k forks source link

Cadvisor service on following nodes is either not reachable OR of a lower version than v2.1. #3525

Open vmwarelab opened 2 months ago

vmwarelab commented 2 months ago

Hi Everyone ..

When setting up the Aria Operations k8s management pack, we configure it to use the cAdvisor collection service that is set up on the k8s cluster we are trying to monitor ( Manifest Below for reference)

we keep getting this error when validating the connection

Cadvisor service on the following nodes is either not reachable OR of a lower version than v2.1.

in the vendor documentation its says to do :

  1. Verify if the cAdvisor service is up and running on the affected nodes and responds to API calls. ( Checked its running )
  2. Verify if the API version of the cAdvisor service is later than 2.1. If not, deploy the latest version of the cAdvisor service. ( if you look at the manifest you ll see we are using /google/cadvisor:latest ) so I m super confused that it calls out for a version later than 2.1 but when I look a repo the latest version is v0.33.0

What am I missing please?

cAdvisor k8s manifest

apiVersion: apps/v1 # apps/v1beta2 in Kube 1.8, extensions/v1beta1 in Kube < 1.8 kind: DaemonSet metadata: name: vrops-cadvisor namespace: kube-system labels: app: vrops-cadvisor annotations: seccomp.security.alpha.kubernetes.io/pod: 'docker/default' spec: selector: matchLabels: app: vrops-cadvisor template: metadata: labels: app: vrops-cadvisor version: latest spec: tolerations:

ValenFontanazzi commented 1 week ago

Hey Maher I also had the same error in my Aria ops the problem is that the spec.template.spec.tolerations of the manifest has no permission to deploy a cadvisor metric collector on the control plane node, i solved that by adding these lines under tolerations, they're from the sample DaemonSet manifest from the official documentation

      - key: node-role.kubernetes.io/control-plane
        operator: Exists
        effect: NoSchedule
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule

Hope this works for you too

vmwarelab commented 1 week ago

Thank you Valen so this yaml should look like this or only configure Tolerations with the above two lines you provided ?

image

ValenFontanazzi commented 1 week ago

Something like that, you should remove the toleration that was there from before, i think it will collide with the new ones.

image

Just like this