sflow / host-sflow

host-sflow agent
http://sflow.net
Other
153 stars 55 forks source link

double disk stats computed #64

Open nicolasb827 opened 8 months ago

nicolasb827 commented 8 months ago

Hello, looking at src/Linux/readDiskCounters.c, and digging into /proc filesystem, I think there is an error on computations. From my point of view, using kernel 3.10.0-1127.10.1.el7.x86_64, if I do a cat diskstat, I have:

[root@host ~]# cat /proc/diskstats
 253       0 vda 58221223 571170 7363640901 993650042 550607954 7206892 10253583588 3165244408 0 877360678 1357257008
 253       1 vda1 58221189 571170 7363638525 993649854 503206454 7206892 10253583588 3159062750 0 1057691671 2927779264
  11       0 sr0 23 0 164 1 0 0 0 0 0 1 1
 253      16 vdb 15425961 3561433 151905576 85938760 10812323 15883365 213623952 242215861 0 8357465 232539678
 253      32 vdc 43128310 226919 2239536699 349343297 39897393 4096268 1491074702 1888861345 0 515622209 2105718800
 252       0 dm-0 43382933 0 2239535459 352386729 42223176 0 1491074702 2163504538 0 519122815 2518555451

LIne 1 we have /dev/vda and line 2 /dev/vda1, which is a partition of /dev/vda:

[root@host ~]# fdisk -l /dev/vda

Disk /dev/vda: 10.7 GB, 10737418240 bytes, 20971520 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x0000aebb

   Device Boot      Start         End      Blocks   Id  System
/dev/vda1   *        2048    20971486    10484719+  83  Linux

So, looking at code, I am pretty sure that total of read/writes are vda + vda1, but, I think that reads of vda1 are compound into value of vda's reads. So, result seems to be incorrect. I tried to fix that by myself, but it seems that it is much more complicated.

The code that does total is:

// report the sum over all disks - except software RAID devices and logical volumes
// because that would cause double-counting.   We identify those by their
// major numbers:
// Software RAID = 9
// Logical Vol = 253
if (majorNo != 9 && majorNo != 253) {
    dsk->reads += reads;
    total_sectors_read += sectors_read;
    dsk->read_time += read_time_ms;
    dsk->writes += writes;
    total_sectors_written += sectors_written;
    dsk->write_time += write_time_ms;
}

(lines 94 -> 106)

And I think that the second problem is that VirtIO devices in this particular case are ignored of computation?

sflow commented 7 months ago

Thanks for looking at this. When the code considers vda and vda1 counters is there a clear difference in the major and minor numbers, or do we need to pull in something else to help discern?

This code has not been looked at for some time so there may be other new constructs and filesystems to take into account. Or it may be better to start with the cgroup stats under /sys if io-accounting is enabled. Probably worth looking around to see where the most reliable totals can be found.