Open vladzcloudius opened 4 months ago
cc @tarzanek @vreniers @mkeeneyj
@vladzcloudius if I get it right, this is a node_exporter issue, right?
@vladzcloudius if I get it right, this is a node_exporter issue, right?
Could be.
@vladzcloudius could it be: https://github.com/prometheus/node_exporter/issues/2310
Installation details Panel Name: Disk Writes/Reads Dashboard Name: OS Metrics Scylla-Monitoring Version: 4.7.1 Scylla-Version:
2024.1.3-0.20240401.64115ae91a55
Kernel version on all nodes:5.15.0-1058-gcp
Description Throughputs (bytes or OPS) of the RAID0 volume (
md0
in screenshots below) is supposed to be equal to a sum of corresponding values on physical disks comprising it. However it's far from it. In some cases, like in screenshots below, the corresponding value is even less. In the example belowmd0
is a RAID0 volume assembled from 4 NVMe disks:nvme0n1,2,3,4
Here is the screenshot showing
md0
and onlynvme0n1
from all nodes (but the same picture is on all other disks:Here you can see the values from all disks on a single node clearly showing the problem:
I ran
iostat
on one of the node trying to see if this is maybe some kernel issue but no,iostat
shows values that totally add up:We saw similar behavior on multiple clusters.