iovisor / bcc

BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more
Apache License 2.0
20.36k stars 3.86k forks source link

biotop reports all operations as reads #4871

Closed b-deam closed 8 months ago

b-deam commented 8 months ago

Hi,

Kernel version: 5.15.0-1052-azure

I've noticed that when using biotop (built from master), all operations are displayed as reads (R), even if they're writes (W).

Take for example this dd command:

$ dd if=/dev/zero of=/tmp/test bs=1M  oflag=direct
3332+0 records in
3332+0 records out
3493855232 bytes (3.5 GB, 3.3 GiB) copied, 22.1949 s, 157 MB/s

And then the corresponding biotop output:

# python3 biotop.py -C
Tracing... Output every 1 secs. Hit Ctrl-C to end

05:23:07 loadavg: 0.24 0.22 0.37 2/1598 2460071

PID     COMM             D MAJ MIN DISK       I/O  Kbytes  AVGms
2459840 dd               R 8   0   sda        142 145408.0   9.00
2457536 kworker/u64:3    R 8   0   sda          1     8.0   2.12
343     jbd2/sda1-8      R 8   0   sda          1     4.0   1.35
0                        R 8   0   sda          1     0.0 498.65

05:23:08 loadavg: 0.30 0.24 0.38 2/1594 2460110

PID     COMM             D MAJ MIN DISK       I/O  Kbytes  AVGms
2459840 dd               R 8   0   sda        162 165888.0   6.08
2457536 kworker/u64:3    R 8   0   sda          4    20.0  27.04
2457536 kworker/u64:3    R 0   0   ?            1 0.0078125   0.25

I think the issue is two fold:

  1. These conditionals looks to be mistakenly commented out: https://github.com/iovisor/bcc/blob/d272c0b7e7a03abe7d3520bfcbdc6ecd78ca8da8/tools/biotop.py#L201-L214
  2. The block request req is not within __trace_req_completion's scope

I am not a C programmer, but using this hack/patch I was able to properly categorise the I/O:

diff --git a/tools/biotop.py b/tools/biotop.py
old mode 100755
new mode 100644
index bf4af203..4c54acd0
--- a/tools/biotop.py
+++ b/tools/biotop.py
@@ -65,8 +65,10 @@ bpf_text = """
 struct start_req_t {
     u64 ts;
     u64 data_len;
+    u64 cmd_flags;
 };

+
 // for saving process info by request
 struct who_t {
     u32 pid;
@@ -161,7 +163,8 @@ int trace_req_start(struct pt_regs *ctx, struct request *req)
     };
     struct start_req_t start_req = {
         .ts = bpf_ktime_get_ns(),
-        .data_len = req->__data_len
+        .data_len = req->__data_len,
+        .cmd_flags = req->cmd_flags
     };
     start.update(&key, &start_req);
     return 0;
@@ -213,6 +216,8 @@ static int __trace_req_completion(struct hash_key key)
     info.rwflag = !!((req->cmd_flags & REQ_OP_MASK) == REQ_OP_WRITE);
 #endif*/

+
+    info.rwflag = !!((startp->cmd_flags & REQ_OP_MASK) == REQ_OP_WRITE);
     if (whop == 0) {
         // missed pid who, save stats as pid 0
         valp = counts.lookup_or_try_init(&info, &zero);

And running dd again:

$ dd if=/dev/zero of=/tmp/test bs=1M  oflag=direct
605+0 records in
605+0 records out
634388480 bytes (634 MB, 605 MiB) copied, 4.06731 s, 156 MB/s

And then the corresponding biotop output:

# python3 biotop.py -C
Tracing... Output every 1 secs. Hit Ctrl-C to end

05:33:34 loadavg: 0.39 0.23 0.28 1/1589 2474011

PID     COMM             D MAJ MIN DISK       I/O  Kbytes  AVGms
2473974 dd               W 8   0   sda        167 171008.0   5.89

05:33:35 loadavg: 0.39 0.23 0.28 2/1589 2474126

PID     COMM             D MAJ MIN DISK       I/O  Kbytes  AVGms
2473974 dd               W 8   0   sda        167 171008.0   5.80
2457536 kworker/u64:3    R 0   0   ?            1 0.0078125   0.24

05:33:36 loadavg: 0.39 0.23 0.28 2/1589 2474126

PID     COMM             D MAJ MIN DISK       I/O  Kbytes  AVGms
2473974 dd               W 8   0   sda        166 169984.0   5.92
1180    python3          W 8   0   sda          1     8.0  37.67

To further validate, this dd commands issues reads:

#  while true; do echo 3 > /proc/sys/vm/drop_caches; dd if=/tmp/test of=/dev/zero; done

And biotop shows:

# python3 biotop.py -C
Tracing... Output every 1 secs. Hit Ctrl-C to end

05:40:50 loadavg: 0.97 0.45 0.32 2/1596 2484035

PID     COMM             D MAJ MIN DISK       I/O  Kbytes  AVGms
2484016 dd               R 8   0   sda       2406 302620.0   0.16
2483985 dd               R 8   0   sda       1791 227856.0   0.17
yonghong-song commented 8 months ago

Disabling

info.rwflag = !!((startp->cmd_flags & REQ_OP_MASK) == REQ_OP_WRITE)

due to the following patch,

commit ed1c98047e0c68126bfb04c8c3ea2455f3b7674d
Author: Jerome Marchand <jmarchan@redhat.com>
Date:   Thu Jul 27 18:19:18 2023 +0200

    tools: Add support for the new block_io_* tracepoints

Just adding

info.rwflag = !!((startp->cmd_flags & REQ_OP_MASK) == REQ_OP_WRITE)

probably won't work. We might to restore the original multiple versioning of info.rwflag -= ... depending on different kernel changes. Could you do some investigation on this?

b-deam commented 8 months ago

I'd assume based on biosnoop that the conditional is still required. My patch was just demonstrating that the commented out conditional + missing scope was indeed the issue: https://github.com/iovisor/bcc/blob/3d21000ab062e94cc69d7c9a3c2b45e416ff15bd/tools/biosnoop.py#L125-L141

yonghong-song commented 8 months ago

@b-deam The above change looks good. Could you help with a pull request?

b-deam commented 8 months ago

I raised https://github.com/iovisor/bcc/pull/4875.

b-deam commented 8 months ago

@yonghong-song Thanks for the review/merging. How often are new release tags cut? We have some internal container images I'd like to update now, but I'd prefer to avoid using master.

yonghong-song commented 8 months ago

Next release will be after next kernel release, probably 2.5 months from now.