Closed Va1 closed 2 years ago
same for me, have updated from 15.4.5 to 18.0.1 and got "No data" for CPU for pods
Where did you deploy your cluster? I mean is this on-prem installation or do you use any cloud provider?
I face the same issue on bare-metal with Debian 10 with kernel 4.19, I suspect that it might be somehow related to CPU Accounting.
Where did you deploy your cluster? I mean is this on-prem installation or do you use any cloud provider?
I face the same issue on bare-metal with Debian 10 with kernel 4.19, I suspect that it might be somehow related to CPU Accounting.
in my case, its GKE - 1.20.8-gke.900
I can update to helm release 16.6.3, and CPU data is present just fine, if I do update fro 16.6.4 (or any later) - I got no data for CPU, but Memory still works just fine
I solved that issue on my side. In my case, it was related to kube-state-metrics
and its service monitor.
First, check if that query works correctly:
kube_pod_info{namespace="monitoring", pod="kube-prometheus-stack-kube-state-metrics-77ffcf4f67-f8qj7"}
If you got results, probably you are facing a different issues.
In my case, I had no results because of the ports that has been used in ServiceMonitor. The service that exposes Kube-state-ports had named port http
, but in the service monitor it uses named port metrics
. Thus, Service Monitor can't reach Kube-state-ports svc and can't get the detailed metrics. The exposed port from Service should match the port used in ServiceMonitor.
apiVersion: v1
kind: Service
metadata:
annotations:
meta.helm.sh/release-name: kube-prometheus-stack
meta.helm.sh/release-namespace: monitoring
prometheus.io/scrape: "true"
creationTimestamp: "2021-06-02T13:47:47Z"
labels:
app.kubernetes.io/instance: kube-prometheus-stack
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: kube-state-metrics
helm.sh/chart: kube-state-metrics-3.4.2
helm.toolkit.fluxcd.io/name: kube-prometheus-stack
helm.toolkit.fluxcd.io/namespace: monitoring
spec:
clusterIP: 10.233.60.122
ports:
- name: http
port: 8080
protocol: TCP
targetPort: 8080
selector:
app.kubernetes.io/instance: kube-prometheus-stack
app.kubernetes.io/name: kube-state-metrics
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
annotations:
meta.helm.sh/release-name: kube-prometheus-stack
meta.helm.sh/release-namespace: monitoring
creationTimestamp: "2021-06-02T13:47:48Z"
generation: 3
labels:
app: kube-prometheus-stack-kube-state-metrics
app.kubernetes.io/instance: kube-prometheus-stack
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/part-of: kube-prometheus-stack
app.kubernetes.io/version: 18.0.2
chart: kube-prometheus-stack-18.0.2
helm.toolkit.fluxcd.io/name: kube-prometheus-stack
helm.toolkit.fluxcd.io/namespace: monitoring
heritage: Helm
release: kube-prometheus-stack
spec:
endpoints:
- honorLabels: true
port: http
jobLabel: app.kubernetes.io/name
selector:
matchLabels:
app.kubernetes.io/instance: kube-prometheus-stack
app.kubernetes.io/name: kube-state-metrics
I hope that helps. Let me know.
yep, looks like mine is different
looks like you are right, here is the change introduced between 6.6.3 and 6.6.4
Having the same issue. I can confirm that the port mismatch is not my issue as that mismatch doesn't seem to be present on 18.0.2. I'm running EKS 1.21.
@oreststetsiak thank you for commenting. unfortunately, downgrading to 16.6.3 did not do the trick for me – still no memory or CPU in Prometheus.
@jakubhajek and thanks for your suggestion. in fact, the query you posted returns results, so i'm facing a different issue.
still have not found a solution to this, unfortunately. is there a good alternative to this helm chart? ideally, a stack chart with all components bundled.
I think the problem is that the current dashboards refer to the label image!="" but if we make query _container_cpu_usage_seconds_total{job="kubelet", metricspath="/metrics/cadvisor"} we see that this label is no longer there and now pod is here version 18.0.5 and EKS-1.21.2
i replace labels from container_memory_working_set_bytes{cluster="$cluster", namespace="$namespace", container!="", image!=""} to container_memory_working_set_bytes{cluster="$cluster", namespace="$namespace", pod!=""} and got this i think it need deep refactor
Hi
I believe the queries are correct and should include image!=""
, especially given that they have worked before. If the image
tag and possible other tags are empty then there are most likely missing metrics meta data.
When I was hit by this issue with No Data in most of my Grafana dashboards, I traced it back to the fact that I had switched to the containerd runtime when I upgraded from EKS 1.20 to EKS 1.21. It turns out that the AWS AMI uses a non-default socket for containerd (/run/dockershim.sock) instead of the default one (/run/containerd/containerd.sock). This causes cadvisor to fail to fetch metrics from the container runtime because it expects it to be available at the default socket location.
Switching back to the docker container runtime fixes the problem and I again get all the metrics I expect. You can also do some creative symlinking to fix this or wait for the fix to be released: https://github.com/awslabs/amazon-eks-ami/pull/724
Also related: https://github.com/kubernetes/kubernetes/issues/89903
There might of course be other problems here, but I advice against rewriting all the queries as I don't believe that is the root cause :)
@haskjold thank you very much for telling me about this bug, you saved me from long hours of unnecessary work
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
@haskjold thank you very much for your answer.
if the recommended solution is to switch back to docker container runtime, what would be the easiest way to achieve this while staying with AWS EKS and EKS worker groups?
same question applies to creative symlinking.
@Va1 The latest EKS AMIs already have the symlink workaround built into it. Simply update your cluster to use any AMI >= v20211001
I can verify it is working perfectly on our deployment.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
This issue is being automatically closed due to inactivity.
In case anyone is as dumb as I am, I'll leave this here.
I was enabling collecting data for verticalpodautoscalers and added the following config to my values.yaml file
kube-state-metrics:
collectors:
- verticalpodautoscalers
That effectively removed ALL OTHER collectors. I had to update the list to include the full list and things came back
Describe the bug a clear and concise description of what the bug is.
so, upon installing the latest chart version (17.2.1) on the latest EKS (recently upgraded to 1.21) and checking Grafana, i've noticed that there's "no data" everywhere for pods.
and upon checking in prometheus, i've realized that at least pod/container CPU & memory metrics are not present at all.
What's your helm version?
version.BuildInfo{Version:"v3.6.3", GitCommit:"d506314abfb5d21419df8c7e7e68012379db2354", GitTreeState:"dirty", GoVersion:"go1.16.5"}
What's your kubectl version?
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.0", GitCommit:"c2b5237ccd9c0f1d600d3072634ca66cefdf272f", GitTreeState:"clean", BuildDate:"2021-08-04T17:56:19Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.2-eks-0389ca3", GitCommit:"8a4e27b9d88142bbdd21b997b532eb6d493df6d2", GitTreeState:"clean", BuildDate:"2021-07-31T01:34:46Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}
Which chart?
kube-prometheus-stack
What's the chart version?
17.2.1
What happened?
pods/containers CPU & memory metrics are missing
What you expected to happen?
No response
How to reproduce it?
No response
Enter the changed values of values.yaml?
Enter the command that you execute and failing/misfunctioning.
helm install prometheus-stack prometheus-community/kube-prometheus-stack --version 17.2.1 --values values.yaml
Anything else we need to know?
No response