Open brancomrt opened 1 month ago
I am using a storage class that stores data on NFS.
storageSpec: volumeClaimTemplate: spec: storageClassName: "nfs-client" accessModes: ["ReadWriteOnce"] resources: requests: storage: 200Gi
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE nfs-client cluster.local/nfs-subdir-external-provisioner Delete Immediate true 131d
@brancomrt I am also facing the same issue with the retention. I set my retention to 15m
but the metrics are cleared and the wal size keeps increasing consuming my disk to the point that I am missing metrics because of no space on device.
Were you able to resolve this?
TIA
Below are my args in the statefulset passed to prometheus v2.54.1
--web.console.templates=/etc/prometheus/consoles
--web.console.libraries=/etc/prometheus/console_libraries
--config.file=/etc/prometheus/config_out/prometheus.env.yaml
--web.enable-lifecycle
--web.external-url=https://redacted.com/prometheus-metrics
--web.route-prefix=/prometheus-metrics
--log.level=debug
--storage.tsdb.retention.time=15m
--storage.tsdb.path=/prometheus
--storage.tsdb.wal-compression
--web.config.file=/etc/prometheus/web_config/web-config.yaml
It was mentioned here in a comment that its resolved in v2.21 but I am using v2.54 and issue still persists.
I cant find exact ref to this but because default block size is compacted every 2 hrs you cannot set retention to below that value without changing serveral other parameters as well.
regardless, this is a ticket is relevant for upstream prom/operator and not the chart repo
Thank you @DrFaust92
This should be closed because it is not a bug but rather a limit of default prometheus configuration.
With the following args configuration, I am seeing the the max-block-duration
is set to 6m
and min-block-duration
is set to 2h
(see the attached screenshot). The durations looks backwards, and the retentions are not happening and the wal keeps growing.
But when I pass storage.tsdb.min-block-duration
set to 1h
and storage.tsdb.max-block-duration
set to 2h
as additional args, I see the wal is compacted every 1h
or when it reaches256MB
size. (in my case its size limit)
I am not sure if the chart is defaulting the values or its a upstream prometheus issue.
--web.console.templates=/etc/prometheus/consoles
--web.console.libraries=/etc/prometheus/console_libraries
--config.file=/etc/prometheus/config_out/prometheus.env.yaml
--web.enable-lifecycle
--web.external-url=https://redacted.com/prometheus-metrics
--web.route-prefix=/prometheus-metrics
--log.level=info
--storage.tsdb.retention.time=1h
--storage.tsdb.retention.size=256MB
--storage.tsdb.path=/prometheus
--storage.tsdb.wal-compression
--web.config.file=/etc/prometheus/web_config/web-config.yaml
@chanakya-svt a minimum block duration that is longer than the maximum block duration doesn't make sense.
@rouke-broersma I tried to look into the charts to see if the chart is passing any args thats causing this, but I couldn't pinpoint to anything. Can you confirm if this is upstream prometheus issue? if so, I can create an issue in the prometheus repo. thank you.
we have the same issue with 2.51
Describe the bug a clear and concise description of what the bug is.
I am experiencing issues with the configuration of retention policies in the kube-prometheus-stack when installed via Helm chart version 61.7.1.
I set the parameter prometheus.prometheusSpec.retention to a value of 10m or 1h for testing data rotation purposes, but the storage PVC keeps growing and does not clean up the data.
What's your helm version?
version.BuildInfo{Version:"v3.14.4", GitCommit:"81c902a123462fd4052bc5e9aa9c513c4c8fc142", GitTreeState:"clean", GoVersion:"go1.21.9"}
What's your kubectl version?
Client Version: v1.27.10 Kustomize Version: v5.0.1 Server Version: v1.28.12+rke2r1
Which chart?
kube-prometheus-stack
What's the chart version?
61.7.1
What happened?
I am experiencing issues with the configuration of retention policies in the kube-prometheus-stack when installed via Helm chart version 61.7.1.
I set the parameter prometheus.prometheusSpec.retention to a value of 10m or 1h for testing data rotation purposes, but the storage PVC keeps growing and does not clean up the data.
What you expected to happen?
Automatic cleanup of Prometheus storage data on the PVC
How to reproduce it?
Waiting for the retention period defined in the values.yaml and checking the storage size of the PVC prometheus-kube-prometheus-stack-prometheus-db-prometheus-kube-prometheus-stack-prometheus-0 to see if it decreases.
Enter the changed values of values.yaml?
prometheus.prometheusSpec.retention
Enter the command that you execute and failing/misfunctioning.
helm upgrade kube-prometheus-stack -n monitoring ./
Local values.yaml chart.
Anything else we need to know?
No response