scottchiefbaker / dool

Python3 compatible fork of dstat
GNU General Public License v3.0
315 stars 58 forks source link

feature: add `--disk-inflight` for blockdevs #53

Closed KJ7LNW closed 9 months ago

KJ7LNW commented 9 months ago
SUMMARY

Add in-flight IOs for block devices:

The inflight column was invaluable in solving this hung disk issue. Would be nice to see it in dool, perhaps as --disk-inflight. You can see them live as follows:

watch -n .1 "tail /sys/block/{nvme,sd}*/stat | awk '{print \$9}'"

Note that the inflight data is in /sys/block/<dev>/stat but is not available in /proc/diskstats.

ISSUE TYPE
DOOL VERSION

1.3.x

OS / ENVIRONMENT

Oracle Linux 9

KJ7LNW commented 9 months ago

Playing with it some more, its also a good indicator of how /sys/block/sda/queue/nr_requests should be tuned.

scottchiefbaker commented 9 months ago

We have --disk-avgqu, --disk-avgrq, and --disk-wait. Do these have any of the information you're looking for?

scottchiefbaker commented 9 months ago

dool --disk --io --disk-avgqu --disk-avgrq --disk-svctm --disk-tps --disk-util --disk-wait 15

There is a lot of data there, and I know what about 2% of it is :)

KJ7LNW commented 9 months ago

Both dool_disk_avgrq.py and dool_disk_avgqu.py use /proc/diskstats but "in-flight" is stored in /sys/block/<dev>/stat in column 9.

A new plugin would read the inflight info per-disk (from /sys/), as opposed to the way that /proc/diskstats lists all disks with their status in one file.

scottchiefbaker commented 9 months ago

Both of those plugins you linked do read from /sys/block/$DEV/stat, they just use /proc/diskstats as a starting point to know which disks to look for. Look on lines 38 and 37 of each plugin respectively.

scottchiefbaker commented 9 months ago

This is my first attempt at writing a dool plugin, but I think I figured out most of the pieces. Place the attached plugin in your ~/.dool/ directory and try running:

dool --disk-inflight

It's almost impossible for me to test this because none of my disks ever had any in-flight data to read from. Hence the call to random.randrange() to generate test data. See if that plugin picks up the data you want and let us know.

File: dool_disk_inflight.py.gz

KJ7LNW commented 9 months ago

We have a gazillion disks, this is all I get:

# dool --disk-inflight
Terminal width too small, trimming output.

so I specified -D but I don't think disk-inflight is using -D:

# dool --disk-inflight -D bcache0,nvme0n1,nvme1n1,sdb
Terminal width too small, trimming output.

It might work if disk-inflight uses -D like avgq does:

]# dool --disk-avgqu -D bcache0,nvme0n1,nvme1n1,sdb
bcac┄nvme┄nvme┄sdb┄
avgq avgq avgq avgq
0.18┊0.68┊1.52┊2.11
 413┊0.43┊1.00┊ 138  
scottchiefbaker commented 9 months ago

I just added support for -D and checked it in to the next branch. Try putting this file in your plug-in dir:

https://github.com/scottchiefbaker/dool/blob/next/plugins/dool_disk_inflight.py

This now filters appropriately: dool --disk-inflight sda,sdb

KJ7LNW commented 9 months ago

Looks like its working!

Questions:

image

More compact is always welcome; we have quite a large dool going at all times to watch server load. This is what we run:

nice --20 dool --bytes -N enp193s0f0np0,enp193s0f1np1 -p --top-bio -D bcache0,nvme0n1,nvme1n1,sdb --disk-util --disk-inflight -S total -drcynmgs 1

image

KJ7LNW commented 9 months ago

Thanks, by the way. I really appreciate you writing up a plugin so quickly.

scottchiefbaker commented 9 months ago

Personally I like the headers the way they are, but I think it makes sense to flip them for consistency. I can make that change pretty easily.

It's super easy to make the columns narrower, but it's going to make the headings really hard to read. Change the self.width = 7 line to whatever number you want and it will shrink the column accordingly. 4 seems to be common for the other disk plugins so we could go with that. It just makes the headers impossible to distinguish.

What do you think?

scottchiefbaker commented 9 months ago

After some more thinking I was able to come up with a name shortener that I think makes sense. Dool supports a lot of disk types so here is a sample mapping to device names that are four characters:

Raw Shorter
sda1 sda1
hda14 hd14
vda99 vd99
hdb hdb
nvme0n1 nv01
nvme1n1 nv11
md123 m123
md124 m124
md125 m125
mmcblk7p50 m750
VxVM4 VxV4
dm-5 dm-5

I updated the plugin with these changes on the next branch. If you can test with these new changes and smaller four character column widths that would be much appreciated.

image

KJ7LNW commented 9 months ago

The change seems to have broken "wrap prevention" where it puts a ">" on the right side indicating the window is too small to prevent wrapping. It still puts the ">" but it wraps anyway.

dool --bytes -N enp193s0f0np0,enp193s0f1np1 -p --top-bio -D bcache0,nvme0n1,nvme1n1,sda --disk-util --disk-inflight -S total -drcynmgs 1

image

scottchiefbaker commented 9 months ago

I'm still learning the ins and outs of the plugin system. Try that change I just landed and see if it solves your issue.

KJ7LNW commented 9 months ago

That looks right, thanks!

I like the new bdev naming convention, too.

KJ7LNW commented 9 months ago

About the new naming conventions, not all plugins use the same names. Notably, --disk-util shows bcac┄nvme┄nvme┄sdb whereas --disk-inflight shows bca0┄nv01┄nv11┄sdb┄.

Some of the other plugins use longer names because they have room (like --disk), which is fine since they have the room. Maybe it should change, maybe not, but I thought I'd point it out. At least of the ones we use, these are the ones that appear to be different. I've not checked the other --disk-* plugins:

I'm not necessarily suggesting that you need to change anything, but I thought I'd let you know since you wanted feedback on the new naming convention change. If you were to make a change, it would be nice to see --disk-util use the new naming style because these "nvme" names are not distinguishable: bcac┄nvme┄nvme┄sdb . Perhaps audit the other --disk-* plugins, too.

The new --disk-inflight names are nice and readable: bca0┄nv01┄nv11┄sdb

Here's a sample from the -next build:

# dool --bytes -N enp193s0f0np0,enp193s0f1np1 -p --top-bio -D bcache0,nvme0n1,nvme1n1,sdb --disk-util --disk-inflight -S total -drcynmgs  1

dool