Open JensErat opened 2 years ago
Can the total disk size change while Prometheus is running?
Can the total disk size change while Prometheus is running?
yes a mounted partition can be resized while mounted (like remote mounted partition or LVM based partition).
This sounds like a good option to have if we can find the disk size in a reliable way. Additionally, instead of just determining it on startup, we may also update the limit during runtime at intervals in case the underlying disk size was increased without a restart, hence not requiring a restart to detect this change.
We can check the disk size every time the BeyondSizeRetention function is called when a percentage is given. This way, we can ensure that we always have an up-to-date value for the disk size without having to store it.
One thing to consider is that there are special cases returning inconsistent values for used and available storage space, especially some network filesystems. I specifically remember NFS filers with snapshots enabled magically reducing available storage space and transparent compression enabling to store more data than the volume theoretically has.
I'd propose to just ignore and potentially document this issue, or maybe add some safeguards when reading free disk space.
One thing to consider is that there are special cases returning inconsistent values for used and available storage space, especially some network filesystems. I specifically remember NFS filers with snapshots enabled magically reducing available storage space and transparent compression enabling to store more data than the volume theoretically has.
I'd propose to just ignore and potentially document this issue, or maybe add some safeguards when reading free disk space.
I would make this parameter optional (obvious) and document it with a warning about remote or exotic filesystems. Prefer using it on local disks or use it at your own risks.
I'm open to review / comment on my PR 😃
Proposal
I'd love to see percentage values supported in size-based retenation. Prometheus might determine disk size on startup, and configure the disk size to n percent of this. For example:
--storage.tsdb.retention.size=90%
would configure a limit of 9GiB for a 10GiB disk.This seems rather easily possible by using the syscall directly, or import Minio libraries (didn't check what license that code is under, though): https://stackoverflow.com/q/20108520/695343
Use case. Why is this important?
We're operating hundreds of Prometheus instances for differently sized Kubernetes clusters using prometheus-operator. We're actively monitoring for Prometheus instances not having a sufficiently large disk and increase their sizes. Currently, we need to not only resize the volume, but also reconfigure our prometheus operator CR, allowing diverging configuration. A percentage-based size definition would allow us to simply configure retention to use up "most of the disk".
Jens Erat jens.erat@daimler.com, Daimler TSS GmbH, imprint