There appears to be no label or info-metric to associate cadvisor's container_fs_* metrics with a PersistentVolume attachment or PersistentVolumeClaim, or with the mount-point of the fs within the container. There is only a device label, for the device-node path within the OS.
This makes it seemingly impossible to determine which meaningful volume a container's I/O is associated with; for example, if a database container has two PVs mounted, one for the main DB and one for WAL, and it also has an ephemeral volume for tempfiles and sorts, there seems to be no way to tell which container_fs_writes_bytes_total metric is for which of the volumes.
Proposed feature
At would be enormously helpful if cadvsor added a label or exposed an info metric associating (device,container) label-pairs from container_fs_* metrics with the k8s volume attachment name or persistent volume claim.
It'd also be great to have an info-metric exposing the volume mount path within the container to the (container,device). This can't be done as an extra label on the container_fs_* metrics because one device node can be mounted multiple times within one container (bind mounts, subvolume mounts, btrfs submounts, etc). This would make it possible to see the container-path the volume is mounted on in monitoring. It would also then be possible to associate the persistent volume by exposing the volumeMount paths for a Pod in kube-state-metrics.
Exposing the filesystem uuid would also be helpful.
Alternatives considered
kube-state-metrics cannot provide this because it has no insight into the device node a container volumeMount path is associated with. There's nothing usable in PersistentVolume or PersistentVolumeClaim's .spec or .status. cadvisor doesn't appear to expose the csi info, pvc uid which could be used to associate these. There is a VolumeAttachment CR with .status.metadata.devicePath (for some CSIs) which k-s-m exposes as kube_volumeattachment_status_attachment_metadata but this only seems to be provided by the AWS EKS CSI, and the volume paths differ to those seen within the container e.g. an in-container /dev/dm-0 is exposed as /dev/xvdaa in the attachment metadata. Thus this is not usable for volume associations.
node-exporter recently gained filesystem_mount_info (https://github.com/prometheus/node_exporter/pull/2970) which maps device to mountpoint - but it isn't container-scoped and doesn't expose the volume attachment so it's not usable for associating the device with a PV. Its older node_filesystem_avail_bytes{device,mountpoint} similarly exposes host-path mount points under /run/containerd/io.containerd.grpc.v1.cri/ and has no info that could be used to associate with a PV, PV attachment or volumeMount. (Due to mount-scoping rules it cannot see some mounts anyway).
kubelet metrics don't appear to expose the needed info either, and there's nothing apparent in the main k8s metrics docs either. Querying kubectl get --raw "/api/v1/nodes/NODENAME/proxy/metrics" and kubectl get --raw "/api/v1/nodes/NODENAME/proxy/metrics/resource" didn't reveal anything promising.
Kubelet /stats/summary is (a) deprecated and (b) exposes the volume's name as listed in Pod.spec.volumes and any pvcRef but not the device-node or mount-path so it cannot be used to associate metrics. It doesn't have I/O stats so it's not an alternative data source either.
So I didn't find any way to associate the cadvisor metrics to the pv attachment, pvc, and pv by fs uuid, pv uuid, pvc uuid, data exposed by kube apiserver, other existing metrics api servers, etc.
Benefits
If a volumeattachment label was available directly or via an info-metric, this could be joined on kube_volumeattachment_spec_source_persistentvolume from kube-state-metrics to find the kube_persistentvolumeclaim_info, kube_persisistentvolume_info, kube_persistentvolumeclaim_labels, etc.
If a mapping of volumeMount paths to volumes and devices was available, I/O could be associated with a specific container path in reporting and dashboards, e.g. "100MiB/s on /postgres/data, 200 MiB/s on /postgres/pg_wal, 500MiB/s on /postgres/ephemeral_store_tablespace".
which are exposed with labels including device (device node path the filesystem is mounted from) and name (container-id without containerd:// prefix), e.g.
There is nothing here, or in any of the other cadvisor metrics I found, that would allow this to be associated with a persistent volume claim. kube-state-metrics cannot expose this information because it does not have access to the device-node paths from which volumes are mounted within containers. See https://github.com/kubernetes/kube-state-metrics/issues/1701
Looking at the cadvisor source:
there's FsInfo in MachineInfo which knows the device-node path, but not any volume attachment or persistent volume info.
there's PerDiskStats in DiskIoStats in ContainerStats, but nothing there associates with a volume attachment or a mount path. There's FsStats in the same file, which again is only keyed by Device and filesystem type, it doesn't have path or attachment info.
in metrics/prometheus.go the metrics with a "device" label don't appear to have anything else to associate with an attachment or claim; I didn't find likely keywords like "vol", "attach", "mount" or "path" anywhere.
There's a GetFsInfoByFsUUID function, and fs/fs.go uses https://pkg.go.dev/github.com/moby/sys/mountinfo#Info so it has access to the mount uid, but the FS UUID isn't exposed as a label, and even if it was there doesn't seem to be anything elsewhere to use the fs uuid to join on for a volume attachment etc.
There's container_blkio_device_usage_total with majorminor and operation but that doesn't provide any association; the rest only have device as a label.
It looks like cadvisor could expose an info-metric on device -> mount points using Mountpoint from https://pkg.go.dev/github.com/moby/sys/mountinfo#Info, and expose the filesystem uid too. This doesn't provide a way to associate with a PV or PVC directly, but might be usable indirectly via pod metadata from kube-state-metrics etc since volume and volumeMount on a Pod are exposed in the API.
Problem
There appears to be no label or info-metric to associate
cadvisor
'scontainer_fs_*
metrics with a PersistentVolume attachment or PersistentVolumeClaim, or with the mount-point of the fs within the container. There is only adevice
label, for the device-node path within the OS.This makes it seemingly impossible to determine which meaningful volume a container's I/O is associated with; for example, if a database container has two PVs mounted, one for the main DB and one for WAL, and it also has an ephemeral volume for tempfiles and sorts, there seems to be no way to tell which
container_fs_writes_bytes_total
metric is for which of the volumes.Proposed feature
At would be enormously helpful if
cadvsor
added a label or exposed an info metric associating(device,container)
label-pairs fromcontainer_fs_*
metrics with the k8s volume attachment name or persistent volume claim.It'd also be great to have an info-metric exposing the volume mount path within the container to the
(container,device)
. This can't be done as an extra label on thecontainer_fs_*
metrics because one device node can be mounted multiple times within one container (bind mounts, subvolume mounts, btrfs submounts, etc). This would make it possible to see the container-path the volume is mounted on in monitoring. It would also then be possible to associate the persistent volume by exposing thevolumeMount
paths for aPod
in kube-state-metrics.Exposing the filesystem
uuid
would also be helpful.Alternatives considered
kube-state-metrics
cannot provide this because it has no insight into the device node a containervolumeMount
path is associated with. There's nothing usable inPersistentVolume
orPersistentVolumeClaim
's.spec
or.status
. cadvisor doesn't appear to expose the csi info, pvc uid which could be used to associate these. There is aVolumeAttachment
CR with.status.metadata.devicePath
(for some CSIs) which k-s-m exposes askube_volumeattachment_status_attachment_metadata
but this only seems to be provided by the AWS EKS CSI, and the volume paths differ to those seen within the container e.g. an in-container/dev/dm-0
is exposed as/dev/xvdaa
in the attachment metadata. Thus this is not usable for volume associations.node-exporter
recently gainedfilesystem_mount_info
(https://github.com/prometheus/node_exporter/pull/2970) which mapsdevice
tomountpoint
- but it isn't container-scoped and doesn't expose the volume attachment so it's not usable for associating the device with a PV. Its oldernode_filesystem_avail_bytes{device,mountpoint}
similarly exposes host-path mount points under/run/containerd/io.containerd.grpc.v1.cri/
and has no info that could be used to associate with a PV, PV attachment or volumeMount. (Due to mount-scoping rules it cannot see some mounts anyway).kubelet metrics don't appear to expose the needed info either, and there's nothing apparent in the main k8s metrics docs either. Querying
kubectl get --raw "/api/v1/nodes/NODENAME/proxy/metrics"
andkubectl get --raw "/api/v1/nodes/NODENAME/proxy/metrics/resource"
didn't reveal anything promising.Kubelet
/stats/summary
is (a) deprecated and (b) exposes the volume's name as listed inPod.spec.volumes
and anypvcRef
but not the device-node or mount-path so it cannot be used to associate metrics. It doesn't have I/O stats so it's not an alternative data source either.So I didn't find any way to associate the cadvisor metrics to the pv attachment, pvc, and pv by fs uuid, pv uuid, pvc uuid, data exposed by kube apiserver, other existing metrics api servers, etc.
Benefits
If a
volumeattachment
label was available directly or via an info-metric, this could be joined onkube_volumeattachment_spec_source_persistentvolume
fromkube-state-metrics
to find thekube_persistentvolumeclaim_info
,kube_persisistentvolume_info
,kube_persistentvolumeclaim_labels
, etc.If a mapping of volumeMount paths to volumes and devices was available, I/O could be associated with a specific container path in reporting and dashboards, e.g. "100MiB/s on /postgres/data, 200 MiB/s on /postgres/pg_wal, 500MiB/s on /postgres/ephemeral_store_tablespace".
Details
cadvisor
exposes some useful container-level filesystem I/O metrics:container_fs_reads_bytes_total
container_fs_reads_total
container_fs_writes_bytes_total
container_fs_writes_total
which are exposed with labels including
device
(device node path the filesystem is mounted from) andname
(container-id withoutcontainerd://
prefix), e.g.container_fs_reads_bytes_total{container="...", device="/dev/dm-0", job="kubelet", metrics_path="/metrics/cadvisor", name="...", pod="...", ...}
There is nothing here, or in any of the other cadvisor metrics I found, that would allow this to be associated with a persistent volume claim.
kube-state-metrics
cannot expose this information because it does not have access to the device-node paths from which volumes are mounted within containers. See https://github.com/kubernetes/kube-state-metrics/issues/1701Looking at the cadvisor source:
FsInfo
inMachineInfo
which knows the device-node path, but not any volume attachment or persistent volume info.PerDiskStats
inDiskIoStats
inContainerStats
, but nothing there associates with a volume attachment or a mount path. There'sFsStats
in the same file, which again is only keyed byDevice
and filesystem type, it doesn't have path or attachment info.metrics/prometheus.go
the metrics with a"device"
label don't appear to have anything else to associate with an attachment or claim; I didn't find likely keywords like "vol", "attach", "mount" or "path" anywhere.GetFsInfoByFsUUID
function, andfs/fs.go
uses https://pkg.go.dev/github.com/moby/sys/mountinfo#Info so it has access to the mount uid, but the FS UUID isn't exposed as a label, and even if it was there doesn't seem to be anything elsewhere to use the fs uuid to join on for a volume attachment etc.There's
container_blkio_device_usage_total
withmajor
minor
andoperation
but that doesn't provide any association; the rest only havedevice
as a label.It looks like cadvisor could expose an info-metric on device -> mount points using
Mountpoint
from https://pkg.go.dev/github.com/moby/sys/mountinfo#Info, and expose the filesystem uid too. This doesn't provide a way to associate with a PV or PVC directly, but might be usable indirectly via pod metadata fromkube-state-metrics
etc sincevolume
andvolumeMount
on aPod
are exposed in the API.Ideally kubelet could expose this mapping instead. Perhaps via https://kubernetes.io/docs/reference/instrumentation/cri-pod-container-metrics/ . There's no sign it does so though.
Related: https://github.com/google/cadvisor/issues/1702