Open mariogkds opened 5 months ago
Hi @mariogkds . Thanks for the report. This looks like a unit conversion issue. We will take a look.
@joaquimrocha I'm seeing this in metrics for RAM in deployments and pods too. Probably other places as well?
Grafana and crictl
report values correctly but headlamp is showing much more.
Example, the headlamp pod, in the Headlamp UI is showing 40 MB RAM being used but it's actually 20.76 MB according to Grafana and crictl
So looks like about double.
CPU and network are correct.
Is this going to get fixed soon, it's confusing our users.
Headlamp 0.25.1
@sarg3nt Yes, we do want to fix this but haven't had the bandwidth yet. Let me try to get it in our pipeline for the next release.
Hi @mariogkds @sarg3nt , thanks for raising these issues! Would you be able to provide the YAML (with any sensitive data redacted) for the problematic resources? Would be super helpful for testing ^^
Hi @mariogkds and @sarg3nt , we really want to address this issue but we haven't been able to reproduce. If you don't mind, please send us some sample YAML based on yours so @skoeva can take a look.
@joaquimrocha sorry for the late reply. Work has been super busy. I'll get you something on Monday.
We've just released our latest version :D
Just a reminder: if you guys are still running into this issue and would like us to get a fix in, your sample YAML would be super helpful to see
@skoeva and @joaquimrocha apologies for not getting back to you.
I've deployed 0.26.0 and still see the double RAM issues.
Every pod I've checked so far is double, even those that just have one replica, sot it's not double counting multiple replicas.
I'm not sure what you mean by sample YAML. If you mean our workloads I can show the resultant deployment and pod yaml for Headlamp as that is also one of the pods showing this double RAM reported behavior.
See YAML below.
Also of note, we are deploying our own custom stack of open telemetry, i.e Prometheus, Grafana, Thanos, etc. but the big thing of note is that our primary data source is Thanos, however Prometheus is showing the same memory data as Thanos as one would expect. crictl
on the nodes also show the same.
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "2"
meta.helm.sh/release-name: headlamp
meta.helm.sh/release-namespace: headlamp
creationTimestamp: "2024-11-08T17:23:10Z"
generation: 2
labels:
app.kubernetes.io/instance: headlamp
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: headlamp
app.kubernetes.io/version: 0.26.0
helm.sh/chart: headlamp-0.26.0
name: headlamp
namespace: headlamp
resourceVersion: "4910016"
uid: f7baaf27-b2eb-4242-86a8-61540068c8b6
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/instance: headlamp
app.kubernetes.io/name: headlamp
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/instance: headlamp
app.kubernetes.io/name: headlamp
spec:
containers:
- args:
- -in-cluster
- -plugins-dir=/headlamp/plugins
- -oidc-client-id=$(OIDC_CLIENT_ID)
- -oidc-client-secret=$(OIDC_CLIENT_SECRET)
- -oidc-idp-issuer-url=$(OIDC_ISSUER_URL)
- -oidc-scopes=$(OIDC_SCOPES)
envFrom:
- secretRef:
name: oidc
image: ghcr.io/headlamp-k8s/headlamp:v0.26.0
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
httpGet:
path: /
port: http
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: headlamp
ports:
- containerPort: 4466
name: http
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /
port: http
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: 150m
memory: 50Mi
requests:
cpu: 80m
memory: 30Mi
securityContext:
privileged: false
runAsGroup: 101
runAsNonRoot: true
runAsUser: 100
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /headlamp/plugins/logo
name: logo
- mountPath: /headlamp/plugins/kubeconfig-plugin
name: kubeconfig-plugin
- mountPath: /headlamp/plugins/sidebar_apps
name: sidebar-apps
- mountPath: /headlamp/plugins/sidebar_grafana
name: sidebar-grafana
- mountPath: /headlamp/plugins/sidebar_kyverno
name: sidebar-kyverno
- mountPath: /headlamp/plugins/sidebar_longhorn
name: sidebar-longhorn
- mountPath: /headlamp/plugins/sidebar_prometheus
name: sidebar-prometheus
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: headlamp
serviceAccountName: headlamp
terminationGracePeriodSeconds: 30
volumes:
- configMap:
defaultMode: 420
name: logo
name: logo
- configMap:
defaultMode: 420
name: kubeconfig-plugin
name: kubeconfig-plugin
- configMap:
defaultMode: 420
name: sidebar-apps
name: sidebar-apps
- configMap:
defaultMode: 420
name: sidebar-grafana
name: sidebar-grafana
- configMap:
defaultMode: 420
name: sidebar-kyverno
name: sidebar-kyverno
- configMap:
defaultMode: 420
name: sidebar-longhorn
name: sidebar-longhorn
- configMap:
defaultMode: 420
name: sidebar-prometheus
name: sidebar-prometheus
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2024-11-08T17:23:12Z"
lastUpdateTime: "2024-11-08T17:23:12Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2024-11-08T17:23:10Z"
lastUpdateTime: "2024-11-12T17:29:08Z"
message: ReplicaSet "headlamp-5847d9f6c8" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 2
readyReplicas: 1
replicas: 1
updatedReplicas: 1
apiVersion: v1
kind: Pod
metadata:
annotations:
cni.projectcalico.org/containerID: f5d91f659ec1fd08c943a8c768114cdbc4d3084d1a25f5bc8a3573d8898e9fff
cni.projectcalico.org/podIP: 192.168.4.9/32
cni.projectcalico.org/podIPs: 192.168.4.9/32
creationTimestamp: "2024-11-12T17:29:00Z"
generateName: headlamp-5847d9f6c8-
labels:
app.kubernetes.io/instance: headlamp
app.kubernetes.io/name: headlamp
pod-template-hash: 5847d9f6c8
name: headlamp-5847d9f6c8-hrwsr
namespace: headlamp
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: headlamp-5847d9f6c8
uid: 20da7856-a5f1-4fd3-a833-894018b9ef63
resourceVersion: "4910002"
uid: 55c9125b-e7df-443c-9806-77386b9860bd
spec:
containers:
- args:
- -in-cluster
- -plugins-dir=/headlamp/plugins
- -oidc-client-id=$(OIDC_CLIENT_ID)
- -oidc-client-secret=$(OIDC_CLIENT_SECRET)
- -oidc-idp-issuer-url=$(OIDC_ISSUER_URL)
- -oidc-scopes=$(OIDC_SCOPES)
envFrom:
- secretRef:
name: oidc
image: ghcr.io/headlamp-k8s/headlamp:v0.26.0
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
httpGet:
path: /
port: http
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: headlamp
ports:
- containerPort: 4466
name: http
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /
port: http
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: 150m
memory: 50Mi
requests:
cpu: 80m
memory: 30Mi
securityContext:
privileged: false
runAsGroup: 101
runAsNonRoot: true
runAsUser: 100
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /headlamp/plugins/logo
name: logo
- mountPath: /headlamp/plugins/kubeconfig-plugin
name: kubeconfig-plugin
- mountPath: /headlamp/plugins/sidebar_apps
name: sidebar-apps
- mountPath: /headlamp/plugins/sidebar_grafana
name: sidebar-grafana
- mountPath: /headlamp/plugins/sidebar_kyverno
name: sidebar-kyverno
- mountPath: /headlamp/plugins/sidebar_longhorn
name: sidebar-longhorn
- mountPath: /headlamp/plugins/sidebar_prometheus
name: sidebar-prometheus
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-hfnn7
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: <redacted>
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: headlamp
serviceAccountName: headlamp
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- configMap:
defaultMode: 420
name: logo
name: logo
- configMap:
defaultMode: 420
name: kubeconfig-plugin
name: kubeconfig-plugin
- configMap:
defaultMode: 420
name: sidebar-apps
name: sidebar-apps
- configMap:
defaultMode: 420
name: sidebar-grafana
name: sidebar-grafana
- configMap:
defaultMode: 420
name: sidebar-kyverno
name: sidebar-kyverno
- configMap:
defaultMode: 420
name: sidebar-longhorn
name: sidebar-longhorn
- configMap:
defaultMode: 420
name: sidebar-prometheus
name: sidebar-prometheus
- name: kube-api-access-hfnn7
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2024-11-12T17:29:08Z"
status: "True"
type: PodReadyToStartContainers
- lastProbeTime: null
lastTransitionTime: "2024-11-12T17:29:00Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2024-11-12T17:29:08Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2024-11-12T17:29:08Z"
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2024-11-12T17:29:00Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: containerd://f8b9e1067abe28139478bc338826e71f28c9e0d25c0b46a724d23d43d02ae030
image: ghcr.io/headlamp-k8s/headlamp:v0.26.0
imageID: ghcr.io/headlamp-k8s/headlamp@sha256:c47fd232a8be2a8756706e3c2af13f23787b0bf1276831b711fa5eaef17390b2
lastState: {}
name: headlamp
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2024-11-12T17:29:07Z"
hostIP: 10.105.148.72
hostIPs:
- ip: 10.105.148.72
phase: Running
podIP: 192.168.4.9
podIPs:
- ip: 192.168.4.9
qosClass: Burstable
startTime: "2024-11-12T17:29:00Z"
If you need to see any manifest data for your prometheus / thanos deployment let me know.
The version of Thanos we are running is thanos:0.35.1-debian-12-r2
The version of Prometheus Operator is prometheus-operator:v0.75.0
My week is very open as most of my teem is at KubeCon so if this doesn't help I can hop on a call and do a screen share to look at whatever you'd like to.
Hello, i am a new user, i really liked the project.
I am having some problems with the cluster wide metrics that are show on the dashboard:
I am using kube-prometheus-stack to handle prometheus and grafana and i am using prometheus-adapter for the metrics API.
To get the headlamp to even show anything i had to add a few settings to the chart's values:
kube-prometheus-stack
prometheus-adapter (which is normal to get the metrics apis)
Individual node's CPU values are correct, the memory value is correct as well but the unit is different:
Is this a headlamp problem or this a prometheus(me) problem?
Thanks for the help and the project have a nice day.