scottchiefbaker / dool

Python3 compatible fork of dstat
GNU General Public License v3.0
326 stars 64 forks source link

--top-io does not behave as expected #75

Open emandret opened 2 weeks ago

emandret commented 2 weeks ago
SUMMARY

The --top-io flag should report the process causing the largest number of IO operations, but it seems like this does not work if the process is spawning child processes which are causing the IO operations.

However, dstat --top-io correctly reports the parent process. I looked at the code for the --top-io plugin and the code is similar, so I wonder what could be causing such a difference.

ISSUE TYPE
DOOL VERSION

Dool 1.3.2

OS / ENVIRONMENT

Platform posix/linux Kernel 5.15.0-119-generic Python 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0]

STEPS TO REPRODUCE

Save this python snippet at /tmp/high_tps.py,

import os
import time
import subprocess

def high_tps_reader(file_path, num_reads_per_second, block_size):
    while True:
        for _ in range(num_reads_per_second):
            subprocess.run([ "dd", f"if={file_path}", "of=/dev/null", f"bs={block_size}", "count=1" ], stderr=subprocess.DEVNULL)

if __name__ == __name__:
    file_path = "/tmp/testfile.txt"
    num_reads_per_second = 10000
    block_size = "4K"  # Read 4KB at a time

    if not os.path.exists(file_path):
        with open(file_path, "wb") as f:
            f.write(b"a" * 1024 * 1024)

    # Start the high TPS reader
    high_tps_reader(file_path, num_reads_per_second, block_size)

And run,

python3 high_tps.py &
dstat --top-io
dool --top-io
EXPECTED RESULTS

The output of both dstat --top-io and dool --top-io should match.

ACTUAL RESULTS

See screenshot in the comments.

emandret commented 2 weeks ago

high_tps

emandret commented 2 weeks ago

Weirdly, it seems that just renaming two variables in the --top-io plugin file solves the issue. I have no idea, but this does work on my machine:

sudo cp /usr/share/dool/dool_top_io.py /usr/share/dool/dool_top_io.py.orig
sudo sed -i -e 's/read_bytes/rchar/g' -e 's/write_bytes/wchar/g' /usr/share/dool/dool_top_io.py
dool --top-io

It seems like this is related to the values defined in /proc/*/io as follows:

cat /proc/$(pgrep python3)/io

Which on my machine, returns,

rchar: 60131822108
wchar: 22285226750
syscr: 74499098
syscw: 21285429
read_bytes: 0
write_bytes: 0
cancelled_write_bytes: 0
scottchiefbaker commented 2 weeks ago

Good research! Looks like that change landed in 64c2cc4d589de156e6a4318ee6836348740d58c9

The documentation on that file in proc is here. Honestly, after reading that I'm not sure which field we should be using.

scottchiefbaker commented 2 weeks ago

There are four plugins that all seem to access read_bytes:

plugins/dool_top_io.py plugins/dool_top_bio.py plugins/dool__pid_detail.py plugins/dool_top_bio_adv.py

Whatever we decide, we should make sure all the plugins are set the same.