Open MikeSpreitzer opened 2 years ago
Here is some more evidence from my cluster.
Following is how I produced those. This is a single-node cluster, and 10.38.225.217 is one of the node's IP addresses. The Prometheus server (I scaled down to 1) has 10.244.0.13 as its cluster IP.
kubectl -s https://10.38.225.217:10250/ --insecure-skip-tls-verify=true get --raw /metrics/cadvisor > kl.metrics.txt
curl 'http://10.244.0.13:9090/api/v1/query?query=container_cpu_usage_seconds_total' | jq . > cpu.json.txt
Looking quickly at kl.metrics.txt, I notice that all container_cpu_usage_seconds_total
samples have empty container
and image
label values. It might be a problem with the downstream distribution?
I created a v1.25.0 with kind
against current main
and failed to produce these issues:
@simonpasquier : can you please clarify what you mean by "the downstream distribution"? I used the reported version of kubeadm, and that installed the corresponding version of kubernetes.
Sorry, my initial report was not quite right about the image labels. The scrape of the kubelet produced container_cpu_usage_seconds_total
metrics in which (a) the container
label doe not appear and (b) the image
label always has the empty string as its value. In the query against the Prometheus server for container_cpu_usage_seconds_total
, every returned data point lacks both image
and cluster
labels.
@MikeSpreitzer IIUC you installed kubeadm from the Ubuntu package which is what I meant by downstream distribution. As @PhilipGough verified, it doesn't happen with kind so it might be a problem with the kubeadm version shipped by Ubuntu?
The scrape of the kubelet produced container_cpu_usage_seconds_total metrics in which (a) the container label doe not appear and (b) the image label always has the empty string as its value. In the query against the Prometheus server for container_cpu_usage_seconds_total, every returned data point lacks both image and cluster labels.
From a Prometheus standpoint, a label with an empty value is the same as the label not being present.
@simonpasquier : thanks for the explanation. When you got success, was it because the image
and cluster
labels were present (and non-empty) for the container_cpu_usage_seconds_total
metric or because the recording rule for node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate
does not mention those labels?
I compared my installed kubeadm with what https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/ says for the "Without a package manager" case, and they look the same.
mspreitz@mjs-ubu2004-dev5-kube1:~$ mkdir tempdownload
mspreitz@mjs-ubu2004-dev5-kube1:~$ cd tempdownload/
mspreitz@mjs-ubu2004-dev5-kube1:~/tempdownload$ export DOWNLOAD_DIR=$PWD
mspreitz@mjs-ubu2004-dev5-kube1:~/tempdownload$ RELEASE="$(curl -sSL https://dl.k8s.io/release/stable.txt)"
mspreitz@mjs-ubu2004-dev5-kube1:~/tempdownload$ CRICTL_VERSION="v1.25.0"
mspreitz@mjs-ubu2004-dev5-kube1:~/tempdownload$ ARCH="amd64"
mspreitz@mjs-ubu2004-dev5-kube1:~/tempdownload$ sudo curl -L --remote-name-all https://storage.googleapis.com/kubernetes-release/release/${RELEASE}/bin/linux/${ARCH}/{kubeadm,kubelet,kubectl}
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 41.7M 100 41.7M 0 0 30.5M 0 0:00:01 0:00:01 --:--:-- 30.5M
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 108M 100 108M 0 0 41.8M 0 0:00:02 0:00:02 --:--:-- 41.8M
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 42.9M 100 42.9M 0 0 40.3M 0 0:00:01 0:00:01 --:--:-- 40.3M
mspreitz@mjs-ubu2004-dev5-kube1:~/tempdownload$ for cmd in kubeadm kubelet kubectl; do shasum -a 256 $cmd $(which $cmd); done
10b30b87af2cdc865983d742891eba467d038f94f3926bf5d0174f1abf6628f8 kubeadm
10b30b87af2cdc865983d742891eba467d038f94f3926bf5d0174f1abf6628f8 /usr/bin/kubeadm
7f9183fce12606818612ce80b6c09757452c4fb50aefea5fc5843951c5020e24 kubelet
7f9183fce12606818612ce80b6c09757452c4fb50aefea5fc5843951c5020e24 /usr/bin/kubelet
e23cc7092218c95c22d8ee36fb9499194a36ac5b5349ca476886b7edc0203885 kubectl
e23cc7092218c95c22d8ee36fb9499194a36ac5b5349ca476886b7edc0203885 /usr/bin/kubectl
mspreitz@mjs-ubu2004-dev5-kube1:~/tempdownload$ RELEASE_VERSION="v0.4.0"
mspreitz@mjs-ubu2004-dev5-kube1:~/tempdownload$ curl -sSL "https://raw.githubusercontent.com/kubernetes/release/${RELEASE_VERSION}/cmd/kubepkg/templates/latest/deb/kubelet/lib/systemd/system/kubelet.service" | sed "s:/usr/bin:${DOWNLOAD_DIR}:g" > kubelet.service
mspreitz@mjs-ubu2004-dev5-kube1:~/tempdownload$ systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Wed 2022-08-24 19:44:24 UTC; 2 weeks 0 days ago
Docs: https://kubernetes.io/docs/home/
Main PID: 8437 (kubelet)
Tasks: 30 (limit: 77122)
Memory: 114.7M
CGroup: /system.slice/kubelet.service
└─8437 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/>
Sep 07 20:54:23 mjs-ubu2004-dev5-kube1.sl.cloud9.ibm.com kubelet[8437]: E0907 20:54:23.234586 8437 remote_runt>
Sep 07 20:54:23 mjs-ubu2004-dev5-kube1.sl.cloud9.ibm.com kubelet[8437]: E0907 20:54:23.234678 8437 container_l>
Sep 07 20:54:23 mjs-ubu2004-dev5-kube1.sl.cloud9.ibm.com kubelet[8437]: E0907 20:54:23.236831 8437 remote_runt>
Sep 07 20:54:23 mjs-ubu2004-dev5-kube1.sl.cloud9.ibm.com kubelet[8437]: E0907 20:54:23.236901 8437 container_l>
Sep 07 20:54:23 mjs-ubu2004-dev5-kube1.sl.cloud9.ibm.com kubelet[8437]: E0907 20:54:23.238942 8437 remote_runt>
Sep 07 20:54:23 mjs-ubu2004-dev5-kube1.sl.cloud9.ibm.com kubelet[8437]: E0907 20:54:23.239011 8437 container_l>
Sep 07 20:54:23 mjs-ubu2004-dev5-kube1.sl.cloud9.ibm.com kubelet[8437]: E0907 20:54:23.241120 8437 remote_runt>
Sep 07 20:54:23 mjs-ubu2004-dev5-kube1.sl.cloud9.ibm.com kubelet[8437]: E0907 20:54:23.241213 8437 container_l>
Sep 07 20:54:23 mjs-ubu2004-dev5-kube1.sl.cloud9.ibm.com kubelet[8437]: E0907 20:54:23.247993 8437 remote_runt>
Sep 07 20:54:23 mjs-ubu2004-dev5-kube1.sl.cloud9.ibm.com kubelet[8437]: E0907 20:54:23.248077 8437 container_l>
mspreitz@mjs-ubu2004-dev5-kube1:~/tempdownload$ sudo diff kubelet.service /lib/systemd/system/kubelet.service
8c8
< ExecStart=/home/mspreitz/tempdownload/kubelet
---
> ExecStart=/usr/bin/kubelet
mspreitz@mjs-ubu2004-dev5-kube1:~/tempdownload$ curl -sSL "https://raw.githubusercontent.com/kubernetes/release/${RELEASE_VERSION}/cmd/kubepkg/templates/latest/deb/kubeadm/10-kubeadm.conf" | sed "s:/usr/bin:${DOWNLOAD_DIR}:g" > 10-kubeadm.conf
mspreitz@mjs-ubu2004-dev5-kube1:~/tempdownload$ sudo diff 10-kubeadm.conf /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
11c11
< ExecStart=/home/mspreitz/tempdownload/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS
---
> ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS
mspreitz@mjs-ubu2004-dev5-kube1:~/tempdownload$
same issue
same issue, no caontainer performance metrics on kubelet
i figured out why is it - in new release cAdvisor was removed from the kubelet - you need to install in separately.
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: cadvisor
namespace: monitoring
labels:
app: cadvisor
release: prometheus
spec:
selector:
matchLabels:
app: cadvisor
endpoints:
- metricRelabelings:
- sourceLabels:
- container_label_io_kubernetes_pod_name
targetLabel: pod
- sourceLabels:
- container_label_io_kubernetes_container_name
targetLabel: container
- sourceLabels:
- container_label_io_kubernetes_pod_namespace
targetLabel: namespace
- action: labeldrop
regex: container_label_io_kubernetes_pod_name
- action: labeldrop
regex: container_label_io_kubernetes_container_name
- action: labeldrop
regex: container_label_io_kubernetes_pod_namespace
port: cadvisor
relabelings:
- sourceLabels:
- __meta_kubernetes_pod_node_name
targetLabel: node
- sourceLabels:
- __metrics_path__
targetLabel: metrics_path
replacement: /metrics/cadvisor
- sourceLabels:
- job
targetLabel: job
replacement: kubelet
@holooloo : thanks. I am not sure I understand what you are saying about the remedy. The object you exhibited does not look to me like something that modifies the kubelet or installs cadvisor elsewhere.
As this mentioned: https://github.com/rancher/rancher/issues/38934#issuecomment-1294585708.
I use that yaml, then promblem solved.
My environment:
kubernetes 1.25 installed using kubeadm on ubuntu 22.04
cri-dockerd
kube-prometheus branch release-0.12
My yaml is a little bit different from that:
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
app: cadvisor
name: cadvisor
namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
app: cadvisor
name: cadvisor
rules:
- apiGroups:
- policy
resourceNames:
- cadvisor
resources:
- podsecuritypolicies
verbs:
- use
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
app: cadvisor
name: cadvisor
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cadvisor
subjects:
- kind: ServiceAccount
name: cadvisor
namespace: monitoring
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
annotations:
seccomp.security.alpha.kubernetes.io/pod: docker/default
labels:
app: cadvisor
name: cadvisor
namespace: monitoring
spec:
selector:
matchLabels:
app: cadvisor
name: cadvisor
template:
metadata:
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
labels:
app: cadvisor
name: cadvisor
spec:
automountServiceAccountToken: false
containers:
- args:
- --housekeeping_interval=10s
- --max_housekeeping_interval=15s
- --event_storage_event_limit=default=0
- --event_storage_age_limit=default=0
- --enable_metrics=app,cpu,disk,diskIO,memory,network,process
- --docker_only
- --store_container_labels=false
- --whitelisted_container_labels=io.kubernetes.container.name,io.kubernetes.pod.name,io.kubernetes.pod.namespace
image: gcr.io/cadvisor/cadvisor:v0.45.0
name: cadvisor
ports:
- containerPort: 8080
name: http
protocol: TCP
resources:
limits:
cpu: 800m
memory: 2000Mi
requests:
cpu: 400m
memory: 400Mi
volumeMounts:
- mountPath: /rootfs
name: rootfs
readOnly: true
- mountPath: /var/run
name: var-run
readOnly: true
- mountPath: /sys
name: sys
readOnly: true
- mountPath: /var/lib/docker
name: docker
readOnly: true
- mountPath: /dev/disk
name: disk
readOnly: true
priorityClassName: system-node-critical
serviceAccountName: cadvisor
terminationGracePeriodSeconds: 30
tolerations:
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
volumes:
- hostPath:
path: /
name: rootfs
- hostPath:
path: /var/run
name: var-run
- hostPath:
path: /sys
name: sys
- hostPath:
path: /var/lib/docker
name: docker
- hostPath:
path: /dev/disk
name: disk
---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
labels:
app: cadvisor
name: cadvisor
namespace: monitoring
spec:
allowedHostPaths:
- pathPrefix: /
- pathPrefix: /var/run
- pathPrefix: /sys
- pathPrefix: /var/lib/docker
- pathPrefix: /dev/disk
fsGroup:
rule: RunAsAny
runAsUser:
rule: RunAsAny
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- '*'
---
apiVersion: v1
kind: Service
metadata:
name: cadvisor
labels:
app: cadvisor
namespace: monitoring
spec:
selector:
app: cadvisor
ports:
- name: cadvisor
port: 8080
protocol: TCP
targetPort: 8080
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app: cadvisor
name: cadvisor
namespace: monitoring
spec:
endpoints:
- metricRelabelings:
- sourceLabels:
- container_label_io_kubernetes_pod_name
targetLabel: pod
- sourceLabels:
- container_label_io_kubernetes_container_name
targetLabel: container
- sourceLabels:
- container_label_io_kubernetes_pod_namespace
targetLabel: namespace
- action: labeldrop
regex: container_label_io_kubernetes_pod_name
- action: labeldrop
regex: container_label_io_kubernetes_container_name
- action: labeldrop
regex: container_label_io_kubernetes_pod_namespace
port: cadvisor
relabelings:
- sourceLabels:
- __meta_kubernetes_pod_node_name
targetLabel: node
- sourceLabels:
- __metrics_path__
targetLabel: metrics_path
replacement: /metrics/cadvisor
- sourceLabels:
- job
targetLabel: job
replacement: kubelet
namespaceSelector:
matchNames:
- monitoring
selector:
matchLabels:
app: cadvisor
Как уже упоминалось: rancher/rancher#38934 (комментарий) .
Я использую этот yaml, тогда проблема решена.
Моё окружение:
- Kubernetes 1.25 установлен с помощью kubeadm на Ubuntu 22.04
- кри-докерд
- версия ветки kube-prometheus-0.12
Мой yaml немного отличается от этого:
apiVersion: v1 kind: ServiceAccount metadata: labels: app: cadvisor name: cadvisor namespace: monitoring --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: labels: app: cadvisor name: cadvisor rules: - apiGroups: - policy resourceNames: - cadvisor resources: - podsecuritypolicies verbs: - use --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: labels: app: cadvisor name: cadvisor roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cadvisor subjects: - kind: ServiceAccount name: cadvisor namespace: monitoring --- apiVersion: apps/v1 kind: DaemonSet metadata: annotations: seccomp.security.alpha.kubernetes.io/pod: docker/default labels: app: cadvisor name: cadvisor namespace: monitoring spec: selector: matchLabels: app: cadvisor name: cadvisor template: metadata: annotations: scheduler.alpha.kubernetes.io/critical-pod: "" labels: app: cadvisor name: cadvisor spec: automountServiceAccountToken: false containers: - args: - --housekeeping_interval=10s - --max_housekeeping_interval=15s - --event_storage_event_limit=default=0 - --event_storage_age_limit=default=0 - --enable_metrics=app,cpu,disk,diskIO,memory,network,process - --docker_only - --store_container_labels=false - --whitelisted_container_labels=io.kubernetes.container.name,io.kubernetes.pod.name,io.kubernetes.pod.namespace image: gcr.io/cadvisor/cadvisor:v0.45.0 name: cadvisor ports: - containerPort: 8080 name: http protocol: TCP resources: limits: cpu: 800m memory: 2000Mi requests: cpu: 400m memory: 400Mi volumeMounts: - mountPath: /rootfs name: rootfs readOnly: true - mountPath: /var/run name: var-run readOnly: true - mountPath: /sys name: sys readOnly: true - mountPath: /var/lib/docker name: docker readOnly: true - mountPath: /dev/disk name: disk readOnly: true priorityClassName: system-node-critical serviceAccountName: cadvisor terminationGracePeriodSeconds: 30 tolerations: - key: node-role.kubernetes.io/control-plane operator: Exists effect: NoSchedule - key: node-role.kubernetes.io/master operator: Exists effect: NoSchedule volumes: - hostPath: path: / name: rootfs - hostPath: path: /var/run name: var-run - hostPath: path: /sys name: sys - hostPath: path: /var/lib/docker name: docker - hostPath: path: /dev/disk name: disk --- apiVersion: policy/v1beta1 kind: PodSecurityPolicy metadata: labels: app: cadvisor name: cadvisor namespace: monitoring spec: allowedHostPaths: - pathPrefix: / - pathPrefix: /var/run - pathPrefix: /sys - pathPrefix: /var/lib/docker - pathPrefix: /dev/disk fsGroup: rule: RunAsAny runAsUser: rule: RunAsAny seLinux: rule: RunAsAny supplementalGroups: rule: RunAsAny volumes: - '*' --- apiVersion: v1 kind: Service metadata: name: cadvisor labels: app: cadvisor namespace: monitoring spec: selector: app: cadvisor ports: - name: cadvisor port: 8080 protocol: TCP targetPort: 8080 --- apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: app: cadvisor name: cadvisor namespace: monitoring spec: endpoints: - metricRelabelings: - sourceLabels: - container_label_io_kubernetes_pod_name targetLabel: pod - sourceLabels: - container_label_io_kubernetes_container_name targetLabel: container - sourceLabels: - container_label_io_kubernetes_pod_namespace targetLabel: namespace - action: labeldrop regex: container_label_io_kubernetes_pod_name - action: labeldrop regex: container_label_io_kubernetes_container_name - action: labeldrop regex: container_label_io_kubernetes_pod_namespace port: cadvisor relabelings: - sourceLabels: - __meta_kubernetes_pod_node_name targetLabel: node - sourceLabels: - __metrics_path__ targetLabel: metrics_path replacement: /metrics/cadvisor - sourceLabels: - job targetLabel: job replacement: kubelet namespaceSelector: matchNames: - monitoring selector: matchLabels: app: cadvisor
I use Kubernetes 1.26.5 and version v1beta does not exist after v1.25 Kubernetes. Also kind PodSecurityPolicy not exist, its renamed Pod Security Admission """ apiVersion: policy/v1beta1 kind: PodSecurityPolicy """
I switch to containerd from docker and all metrics cadvisor working!
What happened? I created a shiny new Kubernetes cluster yesterday using kubeadm and Kubernetes 1.25.0. Then I installed kube-prometheus commit a4e3fc4cda07ffad693c811a112d8b3a6ae51326 (which was
main
at the time). When I looked in Grafana, I found many panels showing "no data".For example, in the dashboard "Kubernetes / Compute Resources / Pod", in the first panel ("CPU Usage"), whose first query is against the derived metric
node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate
. Doing a prometheus query against that metric yields no time series.Looking in the active configuration of the Prometheus server, I see the following recording rule.
A Prometheus query of simply
container_cpu_usage_seconds_total
yields 113 time series. None of them has acluster
orimage
label.Did you expect to see some different? I expected to see compute resource data for my pods.
How to reproduce it (as minimally and precisely as possible): I think I outlined that above.
Environment
Prometheus Operator version:
quay.io/prometheus-operator/prometheus-operator:v0.58.0
Kubernetes version information:
Kubernetes cluster kind:
Manifests:
Not sure which, if any, are relevant
Not sure which, if any, are relevant.
Not sure which, if any, are relevant.
Anything else we need to know?:
OS is Ubuntu 20.04.4 LTS, freshly
apt full-upgrade
d.