openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.6k stars 1.75k forks source link

Constant 4 KiB writes every second to every disk of an idle zpool from z_null_iss #12953

Closed gg7 closed 2 years ago

gg7 commented 2 years ago

System information

Type Version/Name
Distribution Name Gentoo
Distribution Version N/A
Kernel Version 5.15.11-gentoo-dist
Architecture x86_64
OpenZFS Version kmod: 2.1.2-r0-gentoo; userspace: 2.1.2-r1-gentoo

Issue

A remote machine of mine has a RAIDZ1 array of three 16TB HDDs used to store media files. The OS is on a separate SSD. ECC RAM is used.

All three HDDs have the following sector sizes: 512 bytes logical, 4096 bytes physical.

The zpool is completely idle (from my perspective) -- all ZFS filesystems are unmounted (zfs umount -a). There are no zvols. There is no snapshot creation/replication/deletion or scrub happening. No paused scrub either.

sudo iotop fails to catch any IO on the process level (its "Actual DISK WRITE" is correct though), so I don't think we can blame the usual suspects like syslog:

Total DISK READ :       0.00 B/s | Total DISK WRITE :       0.00 B/s
Actual DISK READ:       0.00 B/s | Actual DISK WRITE:      11.70 K/s
    TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
      1 be/4 root        0.00 B/s    0.00 B/s  ?unavailable?  systemd --switched-root --system --deserialize 31
      2 be/4 root        0.00 B/s    0.00 B/s  ?unavailable?  [kthreadd]
      3 be/4 root        0.00 B/s    0.00 B/s  ?unavailable?  [rcu_gp]
      4 be/4 root        0.00 B/s    0.00 B/s  ?unavailable?  [rcu_par_gp]
      6 be/4 root        0.00 B/s    0.00 B/s  ?unavailable?  [kworker/0:0H-events_highpri]
      9 be/4 root        0.00 B/s    0.00 B/s  ?unavailable?  [mm_percpu_wq]
     10 be/4 root        0.00 B/s    0.00 B/s  ?unavailable?  [rcu_tasks_rude_]
     11 be/4 root        0.00 B/s    0.00 B/s  ?unavailable?  [rcu_tasks_trace]
     12 be/4 root        0.00 B/s    0.00 B/s  ?unavailable?  [ksoftirqd/0]
     13 be/4 root        0.00 B/s    0.00 B/s  ?unavailable?  [rcu_sched]
     14 be/4 root        0.00 B/s    0.00 B/s  ?unavailable?  [migration/0]
[...]

zpool iostat -y -v -l zp-3x16-a 1 shows 2 writes per second equalling 12KB/s on the zpool level but (somehow) zero write operations to each disk (although the bandwidth per disk is correct):

                                                 capacity     operations     bandwidth    total_wait     disk_wait    syncq_wait    asyncq_wait  scrub   trim
pool                                           alloc   free   read  write   read  write   read  write   read  write   read  write   read  write   wait   wait
---------------------------------------------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
zp-3x16-a                                      27.7T  16.0T      0      2      0  12.0K      -  524us      -  524us      -    5us      -      -      -      -
  raidz1                                       27.7T  16.0T      0      2      0  12.0K      -  524us      -  524us      -    5us      -      -      -      -
    ata-TOSHIBA_MG08ACA16TE_X1Cxxxxxxxxx           -      -      0      0      0  3.99K      -  393us      -  393us      -    6us      -      -      -      -
    ata-ST16000NM001G-2KKxxx_xxxxxxxx              -      -      0      0      0  3.99K      -  786us      -  786us      -    6us      -      -      -      -
    usb-WD_Elements_25A3_324xxxxxxxxxxxxx-0:0      -      -      0      0      0  3.99K      -  393us      -  393us      -    3us      -      -      -      -
cache                                              -      -      -      -      -      -      -      -      -      -      -      -      -      -      -      -
  /cache/l2arc-zp-3x16-a-0.bin                  999G   711M      0      0      0      0      -      -      -      -      -      -      -      -      -      -
  /cache/l2arc-zp-3x16-a-1.bin                  499G  1008M      0      0      0      0      -      -      -      -      -      -      -      -      -      -
---------------------------------------------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
                                                 capacity     operations     bandwidth    total_wait     disk_wait    syncq_wait    asyncq_wait  scrub   trim
pool                                           alloc   free   read  write   read  write   read  write   read  write   read  write   read  write   wait   wait
---------------------------------------------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
zp-3x16-a                                      27.7T  16.0T      0      2      0  11.9K      -  524us      -  524us      -    6us      -      -      -      -
  raidz1                                       27.7T  16.0T      0      2      0  11.9K      -  524us      -  524us      -    6us      -      -      -      -
    ata-TOSHIBA_MG08ACA16TE_X1Cxxxxxxxxx           -      -      0      0      0  3.98K      -  393us      -  393us      -    6us      -      -      -      -
    ata-ST16000NM001G-2KKxxx_xxxxxxxx              -      -      0      0      0  3.98K      -  786us      -  786us      -    6us      -      -      -      -
    usb-WD_Elements_25A3_324xxxxxxxxxxxxx-0:0      -      -      0      0      0  3.98K      -  393us      -  393us      -    6us      -      -      -      -
cache                                              -      -      -      -      -      -      -      -      -      -      -      -      -      -      -      -
  /cache/l2arc-zp-3x16-a-0.bin                  999G   711M      0      0      0      0      -      -      -      -      -      -      -      -      -      -
  /cache/l2arc-zp-3x16-a-1.bin                  499G  1008M      0      0      0      0      -      -      -      -      -      -      -      -      -      -
---------------------------------------------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
                                                 capacity     operations     bandwidth    total_wait     disk_wait    syncq_wait    asyncq_wait  scrub   trim
pool                                           alloc   free   read  write   read  write   read  write   read  write   read  write   read  write   wait   wait
---------------------------------------------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
zp-3x16-a                                      27.7T  16.0T      0      2      0  12.0K      -  524us      -  524us      -    6us      -      -      -      -
  raidz1                                       27.7T  16.0T      0      2      0  12.0K      -  524us      -  524us      -    6us      -      -      -      -
    ata-TOSHIBA_MG08ACA16TE_X1Cxxxxxxxxx           -      -      0      0      0  3.98K      -  393us      -  393us      -    6us      -      -      -      -
    ata-ST16000NM001G-2KKxxx_xxxxxxxx              -      -      0      0      0  3.98K      -  786us      -  786us      -    6us      -      -      -      -
    usb-WD_Elements_25A3_324xxxxxxxxxxxxx-0:0      -      -      0      0      0  3.98K      -  393us      -  393us      -    6us      -      -      -      -
cache                                              -      -      -      -      -      -      -      -      -      -      -      -      -      -      -      -
  /cache/l2arc-zp-3x16-a-0.bin                  999G   711M      0      0      0      0      -      -      -      -      -      -      -      -      -      -
  /cache/l2arc-zp-3x16-a-1.bin                  499G  1008M      0      0      0      0      -      -      -      -      -      -      -      -      -      -

iostat -yx 1 | egrep '^(sd[abg]|Device)' shows that actually there's a single write being sent to each disk every second, with a 4KB size:

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              0.00      0.00     0.00   0.00    0.00     0.00    1.00      4.00     0.00   0.00    1.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.40
sdb              0.00      0.00     0.00   0.00    0.00     0.00    1.00      4.00     0.00   0.00    1.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.30
sdg              0.00      0.00     0.00   0.00    0.00     0.00    1.00      4.00     0.00   0.00    0.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.30
Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              0.00      0.00     0.00   0.00    0.00     0.00    1.00      4.00     0.00   0.00    0.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.30
sdb              0.00      0.00     0.00   0.00    0.00     0.00    1.00      4.00     0.00   0.00    0.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.70
sdg              0.00      0.00     0.00   0.00    0.00     0.00    1.00      4.00     0.00   0.00    0.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.30
Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              0.00      0.00     0.00   0.00    0.00     0.00    1.00      4.00     0.00   0.00    0.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.30
sdb              0.00      0.00     0.00   0.00    0.00     0.00    1.00      4.00     0.00   0.00    1.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.30
sdg              0.00      0.00     0.00   0.00    0.00     0.00    1.00      4.00     0.00   0.00    1.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.40
Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              0.00      0.00     0.00   0.00    0.00     0.00    1.00      4.00     0.00   0.00    0.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.40
sdb              0.00      0.00     0.00   0.00    0.00     0.00    1.00      4.00     0.00   0.00    0.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.40
sdg              0.00      0.00     0.00   0.00    0.00     0.00    1.00      4.00     0.00   0.00    0.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.30

blktrace shows that z_null_iss is constantly writing 4KiB to the same sectors over and over again (why?):

# blktrace /dev/sda -o - | blkparse -i - | ts
Jan 10 22:42:53   8,0    1        1     0.000000000 1274548  A   W 31251740152 + 8 <- (8,1) 31251738104
Jan 10 22:42:53   8,0    1        2     0.000001322 1274548  Q   W 31251740152 + 8 [z_null_iss]
Jan 10 22:42:53   8,0    1        3     0.000011618 1274548  G   W 31251740152 + 8 [z_null_iss]
Jan 10 22:42:53   8,0    1        4     0.000012915 1274548  P   N [z_null_iss]
Jan 10 22:42:53   8,0    1        5     0.000014179 1274548  U   N [z_null_iss] 1
Jan 10 22:42:53   8,0    1        6     0.000015584 1274548  I   W 31251740152 + 8 [z_null_iss]
Jan 10 22:42:53   8,0    1        7     0.000025499 1274548  D   W 31251740152 + 8 [z_null_iss]
Jan 10 22:42:53   8,0    3        1     0.000252127     0    C   W 31251740152 + 8 [0]
Jan 10 22:42:54   8,0    1        8     1.000918456 1274548  A   W 2552 + 8 <- (8,1) 504
Jan 10 22:42:54   8,0    1        9     1.000919795 1274548  Q   W 2552 + 8 [z_null_iss]
Jan 10 22:42:54   8,0    1       10     1.000930544 1274548  G   W 2552 + 8 [z_null_iss]
Jan 10 22:42:54   8,0    1       11     1.000932078 1274548  P   N [z_null_iss]
Jan 10 22:42:54   8,0    1       12     1.000933457 1274548  U   N [z_null_iss] 1
Jan 10 22:42:54   8,0    1       13     1.000934623 1274548  I   W 2552 + 8 [z_null_iss]
Jan 10 22:42:54   8,0    1       14     1.000944746 1274548  D   W 2552 + 8 [z_null_iss]
Jan 10 22:42:54   8,0    3        2     1.001163000     0    C   W 2552 + 8 [0]
Jan 10 22:42:55   8,0    1       15     2.001629476 1274548  A   W 31251740152 + 8 <- (8,1) 31251738104
Jan 10 22:42:55   8,0    1       16     2.001630810 1274548  Q   W 31251740152 + 8 [z_null_iss]
Jan 10 22:42:55   8,0    1       17     2.001641486 1274548  G   W 31251740152 + 8 [z_null_iss]
Jan 10 22:42:55   8,0    1       18     2.001643136 1274548  P   N [z_null_iss]
Jan 10 22:42:55   8,0    1       19     2.001644435 1274548  U   N [z_null_iss] 1
Jan 10 22:42:55   8,0    1       20     2.001645635 1274548  I   W 31251740152 + 8 [z_null_iss]
Jan 10 22:42:55   8,0    1       21     2.001656144 1274548  D   W 31251740152 + 8 [z_null_iss]
Jan 10 22:42:55   8,0    3        3     2.001814542     0    C   W 31251740152 + 8 [0]
Jan 10 22:42:56   8,0    3        4     3.002299937 1274548  A   W 3064 + 8 <- (8,1) 1016
Jan 10 22:42:56   8,0    3        5     3.002301354 1274548  Q   W 3064 + 8 [z_null_iss]
Jan 10 22:42:56   8,0    3        6     3.002311707 1274548  G   W 3064 + 8 [z_null_iss]
Jan 10 22:42:56   8,0    3        7     3.002313154 1274548  P   N [z_null_iss]
Jan 10 22:42:56   8,0    3        8     3.002314453 1274548  U   N [z_null_iss] 1
Jan 10 22:42:56   8,0    3        9     3.002315644 1274548  I   W 3064 + 8 [z_null_iss]
Jan 10 22:42:56   8,0    3       10     3.002325362 1274548  D   W 3064 + 8 [z_null_iss]
Jan 10 22:42:56   8,0    3       11     3.002541521     0    C   W 3064 + 8 [0]
Jan 10 22:42:57   8,0    3       12     4.003171660 1274548  A   W 3064 + 8 <- (8,1) 1016
Jan 10 22:42:57   8,0    3       13     4.003173134 1274548  Q   W 3064 + 8 [z_null_iss]
Jan 10 22:42:57   8,0    3       14     4.003183627 1274548  G   W 3064 + 8 [z_null_iss]
Jan 10 22:42:57   8,0    3       15     4.003184963 1274548  P   N [z_null_iss]
Jan 10 22:42:57   8,0    3       16     4.003186242 1274548  U   N [z_null_iss] 1
Jan 10 22:42:57   8,0    3       17     4.003187444 1274548  I   W 3064 + 8 [z_null_iss]
Jan 10 22:42:57   8,0    3       18     4.003197944 1274548  D   W 3064 + 8 [z_null_iss]
Jan 10 22:42:57   8,0    3       19     4.003417017     0    C   W 3064 + 8 [0]

# blktrace /dev/sdb -o - | blkparse -i - | ts
Jan 10 22:44:02   8,16   3        1     0.000000000 1274548  A   W 2552 + 8 <- (8,17) 504
Jan 10 22:44:02   8,16   3        2     0.000001345 1274548  Q   W 2552 + 8 [z_null_iss]
Jan 10 22:44:02   8,16   3        3     0.000012525 1274548  G   W 2552 + 8 [z_null_iss]
Jan 10 22:44:02   8,16   3        4     0.000013881 1274548  P   N [z_null_iss]
Jan 10 22:44:02   8,16   3        5     0.000015219 1274548  U   N [z_null_iss] 1
Jan 10 22:44:02   8,16   3        6     0.000016408 1274548  I   W 2552 + 8 [z_null_iss]
Jan 10 22:44:02   8,16   3        7     0.000026799 1274548  D   W 2552 + 8 [z_null_iss]
Jan 10 22:44:02   8,16   3        8     0.000425114     0    C   W 2552 + 8 [0]
Jan 10 22:44:03   8,16   3        9     1.000785232 1274548  A   W 31251740664 + 8 <- (8,17) 31251738616
Jan 10 22:44:03   8,16   3       10     1.000786560 1274548  Q   W 31251740664 + 8 [z_null_iss]
Jan 10 22:44:03   8,16   3       11     1.000797102 1274548  G   W 31251740664 + 8 [z_null_iss]
Jan 10 22:44:03   8,16   3       12     1.000798444 1274548  P   N [z_null_iss]
Jan 10 22:44:03   8,16   3       13     1.000799751 1274548  U   N [z_null_iss] 1
Jan 10 22:44:03   8,16   3       14     1.000801092 1274548  I   W 31251740664 + 8 [z_null_iss]
Jan 10 22:44:03   8,16   3       15     1.000811669 1274548  D   W 31251740664 + 8 [z_null_iss]
Jan 10 22:44:03   8,16   3       16     1.001340817     0    C   W 31251740664 + 8 [0]
Jan 10 22:44:04   8,16   3       17     2.001713738 1274548  A   W 2552 + 8 <- (8,17) 504
Jan 10 22:44:04   8,16   3       18     2.001715373 1274548  Q   W 2552 + 8 [z_null_iss]
Jan 10 22:44:04   8,16   3       19     2.001725992 1274548  G   W 2552 + 8 [z_null_iss]
Jan 10 22:44:04   8,16   3       20     2.001727339 1274548  P   N [z_null_iss]
Jan 10 22:44:04   8,16   3       21     2.001728670 1274548  U   N [z_null_iss] 1
Jan 10 22:44:04   8,16   3       22     2.001729807 1274548  I   W 2552 + 8 [z_null_iss]
Jan 10 22:44:04   8,16   3       23     2.001739986 1274548  D   W 2552 + 8 [z_null_iss]
Jan 10 22:44:04   8,16   3       24     2.002146498     0    C   W 2552 + 8 [0]
Jan 10 22:44:05   8,16   3       25     3.002678885 1274548  A   W 31251740152 + 8 <- (8,17) 31251738104
Jan 10 22:44:05   8,16   3       26     3.002680242 1274548  Q   W 31251740152 + 8 [z_null_iss]
Jan 10 22:44:05   8,16   3       27     3.002690875 1274548  G   W 31251740152 + 8 [z_null_iss]
Jan 10 22:44:05   8,16   3       28     3.002692243 1274548  P   N [z_null_iss]
Jan 10 22:44:05   8,16   3       29     3.002693391 1274548  U   N [z_null_iss] 1
Jan 10 22:44:05   8,16   3       30     3.002694427 1274548  I   W 31251740152 + 8 [z_null_iss]
Jan 10 22:44:05   8,16   3       31     3.002704538 1274548  D   W 31251740152 + 8 [z_null_iss]
Jan 10 22:44:05   8,16   3       32     3.003219052     0    C   W 31251740152 + 8 [0]
Jan 10 22:44:06   8,16   3       33     4.003501254 1274548  A   W 2552 + 8 <- (8,17) 504
Jan 10 22:44:06   8,16   3       34     4.003502714 1274548  Q   W 2552 + 8 [z_null_iss]
Jan 10 22:44:06   8,16   3       35     4.003513526 1274548  G   W 2552 + 8 [z_null_iss]
Jan 10 22:44:06   8,16   3       36     4.003514888 1274548  P   N [z_null_iss]
Jan 10 22:44:06   8,16   3       37     4.003516171 1274548  U   N [z_null_iss] 1
Jan 10 22:44:06   8,16   3       38     4.003517249 1274548  I   W 2552 + 8 [z_null_iss]
Jan 10 22:44:06   8,16   3       39     4.003528008 1274548  D   W 2552 + 8 [z_null_iss]
Jan 10 22:44:06   8,16   3       40     4.003928192     0    C   W 2552 + 8 [0]

# blktrace /dev/sdg -o - | blkparse -i - | ts
Jan 10 22:44:44   8,96   1        1     0.000000000 1274548  A   W 31251675128 + 8 <- (8,97) 31251673080
Jan 10 22:44:44   8,96   1        2     0.000000458 1274548  Q   W 31251675128 + 8 [z_null_iss]
Jan 10 22:44:44   8,96   1        3     0.000005388 1274548  G   W 31251675128 + 8 [z_null_iss]
Jan 10 22:44:44   8,96   1        4     0.000006004 1274548  P   N [z_null_iss]
Jan 10 22:44:44   8,96   1        5     0.000006492 1274548  U   N [z_null_iss] 1
Jan 10 22:44:44   8,96   1        6     0.000006912 1274548  I   W 31251675128 + 8 [z_null_iss]
Jan 10 22:44:44   8,96   1        7     0.000010828 1274548  D   W 31251675128 + 8 [z_null_iss]
Jan 10 22:44:44   8,96   0        1     0.000368580    12    C   W 31251675128 + 8 [0]
Jan 10 22:44:45   8,96   1        8     1.000784722 1274548  A   W 31251675128 + 8 <- (8,97) 31251673080
Jan 10 22:44:45   8,96   1        9     1.000785129 1274548  Q   W 31251675128 + 8 [z_null_iss]
Jan 10 22:44:45   8,96   1       10     1.000789806 1274548  G   W 31251675128 + 8 [z_null_iss]
Jan 10 22:44:45   8,96   1       11     1.000790419 1274548  P   N [z_null_iss]
Jan 10 22:44:45   8,96   1       12     1.000790885 1274548  U   N [z_null_iss] 1
Jan 10 22:44:45   8,96   1       13     1.000791331 1274548  I   W 31251675128 + 8 [z_null_iss]
Jan 10 22:44:45   8,96   1       14     1.000795281 1274548  D   W 31251675128 + 8 [z_null_iss]
Jan 10 22:44:45   8,96   0        2     1.001154883    12    C   W 31251675128 + 8 [0]
Jan 10 22:44:46   8,96   1       15     2.001706902 1274548  A   W 31251674616 + 8 <- (8,97) 31251672568
Jan 10 22:44:46   8,96   1       16     2.001708342 1274548  Q   W 31251674616 + 8 [z_null_iss]
Jan 10 22:44:46   8,96   1       17     2.001718593 1274548  G   W 31251674616 + 8 [z_null_iss]
Jan 10 22:44:46   8,96   1       18     2.001720044 1274548  P   N [z_null_iss]
Jan 10 22:44:46   8,96   1       19     2.001721327 1274548  U   N [z_null_iss] 1
Jan 10 22:44:46   8,96   1       20     2.001722657 1274548  I   W 31251674616 + 8 [z_null_iss]
Jan 10 22:44:46   8,96   1       21     2.001733685 1274548  D   W 31251674616 + 8 [z_null_iss]
Jan 10 22:44:46   8,96   0        3     2.002096695    12    C   W 31251674616 + 8 [0]
Jan 10 22:44:47   8,96   1       22     3.002453102 1274548  A   W 31251675128 + 8 <- (8,97) 31251673080
Jan 10 22:44:47   8,96   1       23     3.002453498 1274548  Q   W 31251675128 + 8 [z_null_iss]
Jan 10 22:44:47   8,96   1       24     3.002457559 1274548  G   W 31251675128 + 8 [z_null_iss]
Jan 10 22:44:47   8,96   1       25     3.002458058 1274548  P   N [z_null_iss]
Jan 10 22:44:47   8,96   1       26     3.002458506 1274548  U   N [z_null_iss] 1
Jan 10 22:44:47   8,96   1       27     3.002459111 1274548  I   W 31251675128 + 8 [z_null_iss]
Jan 10 22:44:47   8,96   1       28     3.002462874 1274548  D   W 31251675128 + 8 [z_null_iss]
Jan 10 22:44:47   8,96   0        4     3.002729434    12    C   W 31251675128 + 8 [0]
Jan 10 22:44:48   8,96   0        5     4.003652799    12    C   W 2552 + 8 [0]
Jan 10 22:44:48   8,96   1       29     4.003195435 1274548  A   W 2552 + 8 <- (8,97) 504
Jan 10 22:44:48   8,96   1       30     4.003196634 1274548  Q   W 2552 + 8 [z_null_iss]
Jan 10 22:44:48   8,96   1       31     4.003207150 1274548  G   W 2552 + 8 [z_null_iss]
Jan 10 22:44:48   8,96   1       32     4.003208570 1274548  P   N [z_null_iss]
Jan 10 22:44:48   8,96   1       33     4.003209961 1274548  U   N [z_null_iss] 1
Jan 10 22:44:48   8,96   1       34     4.003211123 1274548  I   W 2552 + 8 [z_null_iss]
Jan 10 22:44:48   8,96   1       35     4.003221579 1274548  D   W 2552 + 8 [z_null_iss]

If it matters, here are the device major+minor numbers:

# ls -alh /dev/sd{a,b,g}*
brw-rw---- 1 root disk 8,   0 2022-01-10 22:23:26 /dev/sda
brw-rw---- 1 root disk 8,   1 2022-01-10 22:23:26 /dev/sda1
brw-rw---- 1 root disk 8,   9 2022-01-10 22:23:26 /dev/sda9
brw-rw---- 1 root disk 8,  16 2022-01-03 12:49:35 /dev/sdb
brw-rw---- 1 root disk 8,  17 2022-01-03 12:49:35 /dev/sdb1
brw-rw---- 1 root disk 8,  25 2022-01-03 12:49:35 /dev/sdb9
brw-rw---- 1 root disk 8,  96 2022-01-03 12:49:35 /dev/sdg
brw-rw---- 1 root disk 8,  97 2022-01-03 12:49:35 /dev/sdg1
brw-rw---- 1 root disk 8, 105 2022-01-03 12:49:35 /dev/sdg9

I don't understand why this zpool is constantly issuing writes to the HDDs considering there are no actual writes sent to it. This has been going on for days now.

I suspect this is a bug because there's a second RAIDZ1 pool on the same machine which does not issue any writes to its disks when it's idle. It's just the first zpool (zp-3x16-a) that is exhibiting this surprising and annoying behaviour (I am trying to eliminate noise from the HDDs, hence the 1.5 TB L2ARC).

Describe how to reproduce the problem

I have not tried rebooting yet, but exporting and re-importing the pool does not help and neither does zpool-scrub.

I am using ZFS native encryption if it matters.

Include any warning/errors/backtraces from the system logs

There are no errors or warnings in dmesg.

rincebrain commented 2 years ago

What does zpool get all report for each of the two pools?

gg7 commented 2 years ago

@rincebrain

# zfs get all -s local,received
NAME                                                                                PROPERTY              VALUE                                                   SOURCE
zp-3x12-a                                                                           recordsize            1M                                                      received
zp-3x12-a                                                                           compression           lz4                                                     received
zp-3x12-a                                                                           atime                 off                                                     received
zp-3x12-a                                                                           xattr                 sa                                                      received
zp-3x12-a                                                                           acltype               posix                                                   received
zp-3x12-a/backup                                                                    primarycache          none                                                    received
zp-3x12-a/backup                                                                    secondarycache        none                                                    received
zp-3x16-a                                                                           recordsize            1M                                                      local
zp-3x16-a                                                                           compression           lz4                                                     local
zp-3x16-a                                                                           atime                 off                                                     local
zp-3x16-a                                                                           xattr                 sa                                                      local
zp-3x16-a                                                                           acltype               posix                                                   local
zp-3x16-a                                                                           overlay               off                                                     local
zp-3x16-a/encrypted                                                                 keylocation           prompt                                                  local
zp-3x16-a/encrypted/backup                                                          compression           gzip                                                    local
zp-3x16-a/encrypted/backup                                                          primarycache          none                                                    local
zp-3x16-a/encrypted/backup                                                          secondarycache        none                                                    local
zp-3x16-a/encrypted/cache                                                           quota                 32G                                                     local

If you really want everything, here are the properties of the main datasets. Note I have executed zfs mount -a after creating this ticket in order to run a copy job.

# zfs get all zp-3x12-a zp-3x16-a -d 0 | grep -v guid
NAME       PROPERTY              VALUE                  SOURCE
zp-3x12-a  type                  filesystem             -
zp-3x12-a  creation              Sun Feb 14 22:05 2021  -
zp-3x12-a  used                  14.9T                  -
zp-3x12-a  available             6.79T                  -
zp-3x12-a  referenced            128K                   -
zp-3x12-a  compressratio         1.03x                  -
zp-3x12-a  mounted               yes                    -
zp-3x12-a  quota                 none                   default
zp-3x12-a  reservation           none                   default
zp-3x12-a  recordsize            1M                     received
zp-3x12-a  mountpoint            /zp-3x12-a             default
zp-3x12-a  sharenfs              off                    default
zp-3x12-a  checksum              on                     default
zp-3x12-a  compression           lz4                    received
zp-3x12-a  atime                 off                    received
zp-3x12-a  devices               on                     default
zp-3x12-a  exec                  on                     default
zp-3x12-a  setuid                on                     default
zp-3x12-a  readonly              off                    default
zp-3x12-a  zoned                 off                    default
zp-3x12-a  snapdir               hidden                 default
zp-3x12-a  aclmode               discard                default
zp-3x12-a  aclinherit            restricted             default
zp-3x12-a  createtxg             1                      -
zp-3x12-a  canmount              on                     default
zp-3x12-a  xattr                 sa                     received
zp-3x12-a  copies                1                      default
zp-3x12-a  version               5                      -
zp-3x12-a  utf8only              off                    -
zp-3x12-a  normalization         none                   -
zp-3x12-a  casesensitivity       sensitive              -
zp-3x12-a  vscan                 off                    default
zp-3x12-a  nbmand                off                    default
zp-3x12-a  sharesmb              off                    default
zp-3x12-a  refquota              none                   default
zp-3x12-a  refreservation        none                   default
zp-3x12-a  primarycache          all                    default
zp-3x12-a  secondarycache        all                    default
zp-3x12-a  usedbysnapshots       85.2K                  -
zp-3x12-a  usedbydataset         128K                   -
zp-3x12-a  usedbychildren        14.9T                  -
zp-3x12-a  usedbyrefreservation  0B                     -
zp-3x12-a  logbias               latency                default
zp-3x12-a  objsetid              54                     -
zp-3x12-a  dedup                 off                    default
zp-3x12-a  mlslabel              none                   default
zp-3x12-a  sync                  standard               default
zp-3x12-a  dnodesize             legacy                 default
zp-3x12-a  refcompressratio      1.00x                  -
zp-3x12-a  written               85.2K                  -
zp-3x12-a  logicalused           15.4T                  -
zp-3x12-a  logicalreferenced     42K                    -
zp-3x12-a  volmode               default                default
zp-3x12-a  filesystem_limit      none                   default
zp-3x12-a  snapshot_limit        none                   default
zp-3x12-a  filesystem_count      none                   default
zp-3x12-a  snapshot_count        none                   default
zp-3x12-a  snapdev               hidden                 default
zp-3x12-a  acltype               posix                  received
zp-3x12-a  context               none                   default
zp-3x12-a  fscontext             none                   default
zp-3x12-a  defcontext            none                   default
zp-3x12-a  rootcontext           none                   default
zp-3x12-a  relatime              off                    default
zp-3x12-a  redundant_metadata    all                    default
zp-3x12-a  overlay               on                     default
zp-3x12-a  encryption            off                    default
zp-3x12-a  keylocation           none                   default
zp-3x12-a  keyformat             none                   default
zp-3x12-a  pbkdf2iters           0                      default
zp-3x12-a  special_small_blocks  0                      default
zp-3x16-a  type                  filesystem             -
zp-3x16-a  creation              Tue Dec 14  8:49 2021  -
zp-3x16-a  used                  18.4T                  -
zp-3x16-a  available             10.5T                  -
zp-3x16-a  referenced            128K                   -
zp-3x16-a  compressratio         1.04x                  -
zp-3x16-a  mounted               yes                    -
zp-3x16-a  quota                 none                   default
zp-3x16-a  reservation           none                   default
zp-3x16-a  recordsize            1M                     local
zp-3x16-a  mountpoint            /zp-3x16-a             default
zp-3x16-a  sharenfs              off                    default
zp-3x16-a  checksum              on                     default
zp-3x16-a  compression           lz4                    local
zp-3x16-a  atime                 off                    local
zp-3x16-a  devices               on                     default
zp-3x16-a  exec                  on                     default
zp-3x16-a  setuid                on                     default
zp-3x16-a  readonly              off                    default
zp-3x16-a  zoned                 off                    default
zp-3x16-a  snapdir               hidden                 default
zp-3x16-a  aclmode               discard                default
zp-3x16-a  aclinherit            restricted             default
zp-3x16-a  createtxg             1                      -
zp-3x16-a  canmount              on                     default
zp-3x16-a  xattr                 sa                     local
zp-3x16-a  copies                1                      default
zp-3x16-a  version               5                      -
zp-3x16-a  utf8only              off                    -
zp-3x16-a  normalization         none                   -
zp-3x16-a  casesensitivity       sensitive              -
zp-3x16-a  vscan                 off                    default
zp-3x16-a  nbmand                off                    default
zp-3x16-a  sharesmb              off                    default
zp-3x16-a  refquota              none                   default
zp-3x16-a  refreservation        none                   default
zp-3x16-a  primarycache          all                    default
zp-3x16-a  secondarycache        all                    default
zp-3x16-a  usedbysnapshots       0B                     -
zp-3x16-a  usedbydataset         128K                   -
zp-3x16-a  usedbychildren        18.4T                  -
zp-3x16-a  usedbyrefreservation  0B                     -
zp-3x16-a  logbias               latency                default
zp-3x16-a  objsetid              54                     -
zp-3x16-a  dedup                 off                    default
zp-3x16-a  mlslabel              none                   default
zp-3x16-a  sync                  standard               default
zp-3x16-a  dnodesize             legacy                 default
zp-3x16-a  refcompressratio      1.00x                  -
zp-3x16-a  written               128K                   -
zp-3x16-a  logicalused           19.3T                  -
zp-3x16-a  logicalreferenced     42K                    -
zp-3x16-a  volmode               default                default
zp-3x16-a  filesystem_limit      none                   default
zp-3x16-a  snapshot_limit        none                   default
zp-3x16-a  filesystem_count      none                   default
zp-3x16-a  snapshot_count        none                   default
zp-3x16-a  snapdev               hidden                 default
zp-3x16-a  acltype               posix                  local
zp-3x16-a  context               none                   default
zp-3x16-a  fscontext             none                   default
zp-3x16-a  defcontext            none                   default
zp-3x16-a  rootcontext           none                   default
zp-3x16-a  relatime              off                    default
zp-3x16-a  redundant_metadata    all                    default
zp-3x16-a  overlay               off                    local
zp-3x16-a  encryption            off                    default
zp-3x16-a  keylocation           none                   default
zp-3x16-a  keyformat             none                   default
zp-3x16-a  pbkdf2iters           0                      default
zp-3x16-a  special_small_blocks  0                      default

zdb

# zdb -v | grep -v guid
zp-3x12-a:
    version: 5000
    name: 'zp-3x12-a'
    state: 0
    txg: 5045232
    errata: 0
    hostid: xxx
    hostname: 'xxx'
    com.delphix:has_per_vdev_zaps
    vdev_children: 1
    vdev_tree:
        type: 'root'
        id: 0
        create_txg: 4
        children[0]:
            type: 'raidz'
            id: 0
            nparity: 1
            metaslab_array: 256
            metaslab_shift: 34
            ashift: 12
            asize: 36000351387648
            is_log: 0
            create_txg: 4
            com.delphix:vdev_zap_top: 129
            children[0]:
                type: 'disk'
                id: 0
                path: '/dev/mapper/ata-WDC_WD121KRYZ-01W0RB0_2AGxxxxx-decrypted'
                devid: 'dm-uuid-CRYPT-LUKS1-xxx-ata-WDC_WD121KRYZ-01W0RB0_2AGxxxxx-decrypted'
                phys_path: '/dev/disk/by-uuid/319xxxxxxxxxxxxxxxx'
                whole_disk: 0
                DTL: 11453
                create_txg: 4
                com.delphix:vdev_zap_leaf: 11451
            children[1]:
                type: 'disk'
                id: 1
                path: '/dev/mapper/ata-WDC_WD121KRYZ-01W0RB0_5PHxxxxx-decrypted'
                devid: 'dm-uuid-CRYPT-LUKS1-c72dxxxxxxxxxxxxxxxxxxxxxxxxxxxx-ata-WDC_WD121KRYZ-01W0RB0_5PHxxxxx-decrypted'
                phys_path: '/dev/disk/by-uuid/319xxxxxxxxxxxxxxxx'
                whole_disk: 0
                DTL: 15553
                create_txg: 4
                com.delphix:vdev_zap_leaf: 131
            children[2]:
                type: 'disk'
                id: 2
                path: '/dev/mapper/ata-WDC_WD121KRYZ-01W0RB0_5PKxxxxx-decrypted'
                devid: 'dm-uuid-CRYPT-LUKS2-0257xxxxxxxxxxxxxxxxxxxxxxxxxxxx-ata-WDC_WD121KRYZ-01W0RB0_5PKxxxxx-decrypted'
                whole_disk: 0
                DTL: 15552
                create_txg: 4
                com.delphix:vdev_zap_leaf: 132
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
zp-3x16-a:
    version: 5000
    name: 'zp-3x16-a'
    state: 0
    txg: 497101
    errata: 0
    hostid: xxx
    hostname: 'xxx'
    com.delphix:has_per_vdev_zaps
    vdev_children: 1
    vdev_tree:
        type: 'root'
        id: 0
        create_txg: 4
        children[0]:
            type: 'raidz'
            id: 0
            nparity: 1
            metaslab_array: 256
            metaslab_shift: 34
            ashift: 12
            asize: 48002285174784
            is_log: 0
            create_txg: 4
            com.delphix:vdev_zap_top: 129
            children[0]:
                type: 'disk'
                id: 0
                path: '/dev/disk/by-id/ata-TOSHIBA_MG08ACA16TE_X1Cxxxxxxxxx-part1'
                devid: 'ata-TOSHIBA_MG08ACA16TE_X1Cxxxxxxxxx-part1'
                phys_path: 'pci-0000:08:00.0-ata-1.0'
                whole_disk: 1
                DTL: 135
                create_txg: 4
                com.delphix:vdev_zap_leaf: 269
            children[1]:
                type: 'disk'
                id: 1
                path: '/dev/disk/by-id/ata-ST16000NM001G-2KKxxx_xxxxxxxx-part1'
                devid: 'ata-ST16000NM001G-2KKxxx_xxxxxxxx-part1'
                phys_path: 'pci-0000:00:17.0-ata-2.0'
                whole_disk: 1
                DTL: 72
                create_txg: 4
                com.delphix:vdev_zap_leaf: 70
            children[2]:
                type: 'disk'
                id: 2
                path: '/dev/disk/by-id/usb-WD_Elements_25A3_324xxxxxxxxxxxxx-0:0-part1'
                devid: 'usb-WD_Elements_25A3_324xxxxxxxxxxxxx-0:0-part1'
                phys_path: 'pci-0000:00:14.0-usb-0:6:1.0-scsi-0:0:0:0'
                whole_disk: 1
                DTL: 275
                create_txg: 4
                com.delphix:vdev_zap_leaf: 273
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data

arc_summary

# arc_summary
------------------------------------------------------------------------
ZFS Subsystem Report                            Tue Jan 11 00:44:14 2022
Linux 5.15.11-gentoo-dist                                2.1.2-r0-gentoo
Machine: xxxx (x86_64)                                   2.1.2-r0-gentoo

ARC status:                                                      HEALTHY
        Memory throttle count:                                         0

ARC size (current):                                    53.2 %    4.1 GiB
        Target size (adaptive):                        53.3 %    4.1 GiB
        Min size (hard limit):                          6.2 %  496.4 MiB
        Max size (high water):                           16:1    7.8 GiB
        Most Frequently Used (MFU) cache size:         99.3 %    3.9 GiB
        Most Recently Used (MRU) cache size:            0.7 %   29.8 MiB
        Metadata cache size (hard limit):              75.0 %    5.8 GiB
        Metadata cache size (current):                  5.7 %  340.7 MiB
        Dnode cache size (hard limit):                 10.0 %  595.7 MiB
        Dnode cache size (current):                     2.9 %   17.2 MiB

ARC hash breakdown:
        Elements max:                                               2.5M
        Elements current:                              87.0 %       2.2M
        Collisions:                                                20.1M
        Chain max:                                                     9
        Chains:                                                   587.3k

ARC misc:
        Deleted:                                                   11.7M
        Mutex misses:                                              52.3k
        Eviction skips:                                           173.9M
        Eviction skips due to L2 writes:                              81
        L2 cached evictions:                                     6.9 TiB
        L2 eligible evictions:                                   1.9 TiB
        L2 eligible MFU evictions:                      1.9 %   37.9 GiB
        L2 eligible MRU evictions:                     98.1 %    1.9 TiB
        L2 ineligible evictions:                                 1.4 TiB

ARC total accesses (hits + misses):                                 1.6G
        Cache hit ratio:                               97.9 %       1.6G
        Cache miss ratio:                               2.1 %      32.9M
        Actual hit ratio (MFU + MRU hits):             97.7 %       1.6G
        Data demand efficiency:                        99.7 %     734.0M
        Data prefetch efficiency:                      14.9 %       7.5M

Cache hits by cache type:
        Most frequently used (MFU):                    97.3 %       1.5G
        Most recently used (MRU):                       2.5 %      38.6M
        Most frequently used (MFU) ghost:             < 0.1 %     678.1k
        Most recently used (MRU) ghost:                 1.3 %      19.8M

Cache hits by data type:
        Demand data:                                   46.8 %     731.4M
        Demand prefetch data:                           0.1 %       1.1M
        Demand metadata:                               48.6 %     759.8M
        Demand prefetch metadata:                       4.6 %      71.7M

Cache misses by data type:
        Demand data:                                    7.7 %       2.5M
        Demand prefetch data:                          19.3 %       6.4M
        Demand metadata:                                9.5 %       3.1M
        Demand prefetch metadata:                      63.5 %      20.9M

DMU prefetch efficiency:                                           22.1M
        Hit ratio:                                     27.8 %       6.1M
        Miss ratio:                                    72.2 %      16.0M

L2ARC status:                                                    HEALTHY
        Low memory aborts:                                             3
        Free on write:                                             14.1k
        R/W clashes:                                                  19
        Bad checksums:                                                 0
        I/O errors:                                                    0

L2ARC size (adaptive):                                           1.5 TiB
        Compressed:                                    97.9 %    1.5 TiB
        Header size:                                  < 0.1 %  199.0 MiB
        MFU allocated size:                            82.9 %    1.2 TiB
        MRU allocated size:                            15.6 %  234.6 GiB
        Prefetch allocated size:                        1.4 %   21.5 GiB
        Data (buffer content) allocated size:          99.9 %    1.5 TiB
        Metadata (buffer content) allocated size:       0.1 %    1.7 GiB

L2ARC breakdown:                                                   19.4M
        Hit ratio:                                     14.8 %       2.9M
        Miss ratio:                                    85.2 %      16.5M
        Feeds:                                                    648.3k

L2ARC writes:
        Writes sent:                                    100 %     300.6k

L2ARC evicts:
        Lock retries:                                                 16
        Upon reading:                                                  0

Solaris Porting Layer (SPL):
        spl_hostid                                                     0
        spl_hostid_path                                      /etc/hostid
        spl_kmem_alloc_max                                       1048576
        spl_kmem_alloc_warn                                        65536
        spl_kmem_cache_kmem_threads                                    4
        spl_kmem_cache_magazine_size                                   0
        spl_kmem_cache_max_size                                       32
        spl_kmem_cache_obj_per_slab                                    8
        spl_kmem_cache_reclaim                                         0
        spl_kmem_cache_slab_limit                                  16384
        spl_max_show_tasks                                           512
        spl_panic_halt                                                 0
        spl_schedule_hrtimeout_slack_us                                0
        spl_taskq_kick                                                 0
        spl_taskq_thread_bind                                          0
        spl_taskq_thread_dynamic                                       1
        spl_taskq_thread_priority                                      1
        spl_taskq_thread_sequential                                    4

Tunables:
        dbuf_cache_hiwater_pct                                        10
        dbuf_cache_lowater_pct                                        10
        dbuf_cache_max_bytes                        18446744073709551615
        dbuf_cache_shift                                               5
        dbuf_metadata_cache_max_bytes               18446744073709551615
        dbuf_metadata_cache_shift                                      6
        dmu_object_alloc_chunk_shift                                   7
        dmu_prefetch_max                                       134217728
        ignore_hole_birth                                              1
        l2arc_feed_again                                               1
        l2arc_feed_min_ms                                            200
        l2arc_feed_secs                                                1
        l2arc_headroom                                                16
        l2arc_headroom_boost                                         200
        l2arc_meta_percent                                            50
        l2arc_mfuonly                                                  0
        l2arc_noprefetch                                               0
        l2arc_norw                                                     0
        l2arc_rebuild_blocks_min_l2size                       1073741824
        l2arc_rebuild_enabled                                          1
        l2arc_trim_ahead                                               0
        l2arc_write_boost                                        8388608
        l2arc_write_max                                         33554432
        metaslab_aliquot                                          524288
        metaslab_bias_enabled                                          1
        metaslab_debug_load                                            0
        metaslab_debug_unload                                          0
        metaslab_df_max_search                                  16777216
        metaslab_df_use_largest_segment                                0
        metaslab_force_ganging                                  16777217
        metaslab_fragmentation_factor_enabled                          1
        metaslab_lba_weighting_enabled                                 1
        metaslab_preload_enabled                                       1
        metaslab_unload_delay                                         32
        metaslab_unload_delay_ms                                  600000
        send_holes_without_birth_time                                  1
        spa_asize_inflation                                           24
        spa_config_path                             /etc/zfs/zpool.cache
        spa_load_print_vdev_tree                                       0
        spa_load_verify_data                                           1
        spa_load_verify_metadata                                       1
        spa_load_verify_shift                                          4
        spa_slop_shift                                                 5
        vdev_file_logical_ashift                                       9
        vdev_file_physical_ashift                                      9
        vdev_removal_max_span                                      32768
        vdev_validate_skip                                             0
        zap_iterate_prefetch                                           1
        zfetch_array_rd_sz                                       1048576
        zfetch_max_distance                                      8388608
        zfetch_max_idistance                                    67108864
        zfetch_max_streams                                             8
        zfetch_min_sec_reap                                            2
        zfs_abd_scatter_enabled                                        1
        zfs_abd_scatter_max_order                                     10
        zfs_abd_scatter_min_size                                    1536
        zfs_admin_snapshot                                             0
        zfs_allow_redacted_dataset_mount                               0
        zfs_arc_average_blocksize                                   8192
        zfs_arc_dnode_limit                                            0
        zfs_arc_dnode_limit_percent                                   10
        zfs_arc_dnode_reduce_percent                                  10
        zfs_arc_evict_batch_limit                                     10
        zfs_arc_eviction_pct                                         200
        zfs_arc_grow_retry                                             0
        zfs_arc_lotsfree_percent                                      10
        zfs_arc_max                                                    0
        zfs_arc_meta_adjust_restarts                                4096
        zfs_arc_meta_limit                                             0
        zfs_arc_meta_limit_percent                                    75
        zfs_arc_meta_min                                               0
        zfs_arc_meta_prune                                         10000
        zfs_arc_meta_strategy                                          1
        zfs_arc_min                                                    0
        zfs_arc_min_prefetch_ms                                        0
        zfs_arc_min_prescient_prefetch_ms                              0
        zfs_arc_p_dampener_disable                                     1
        zfs_arc_p_min_shift                                            0
        zfs_arc_pc_percent                                             0
        zfs_arc_shrink_shift                                           0
        zfs_arc_shrinker_limit                                     10000
        zfs_arc_sys_free                                               0
        zfs_async_block_max_blocks                  18446744073709551615
        zfs_autoimport_disable                                         1
        zfs_checksum_events_per_second                                20
        zfs_commit_timeout_pct                                         5
        zfs_compressed_arc_enabled                                     1
        zfs_condense_indirect_commit_entry_delay_ms                    0
        zfs_condense_indirect_obsolete_pct                            25
        zfs_condense_indirect_vdevs_enable                             1
        zfs_condense_max_obsolete_bytes                       1073741824
        zfs_condense_min_mapping_bytes                            131072
        zfs_dbgmsg_enable                                              1
        zfs_dbgmsg_maxsize                                       4194304
        zfs_dbuf_state_index                                           0
        zfs_ddt_data_is_special                                        1
        zfs_deadman_checktime_ms                                   60000
        zfs_deadman_enabled                                            1
        zfs_deadman_failmode                                        wait
        zfs_deadman_synctime_ms                                   600000
        zfs_deadman_ziotime_ms                                    300000
        zfs_dedup_prefetch                                             0
        zfs_delay_min_dirty_percent                                   60
        zfs_delay_scale                                           500000
        zfs_delete_blocks                                          20480
        zfs_dirty_data_max                                    1665681817
        zfs_dirty_data_max_max                                4164204544
        zfs_dirty_data_max_max_percent                                25
        zfs_dirty_data_max_percent                                    10
        zfs_dirty_data_sync_percent                                   20
        zfs_disable_ivset_guid_check                                   0
        zfs_dmu_offset_next_sync                                       0
        zfs_embedded_slog_min_ms                                      64
        zfs_expire_snapshot                                          300
        zfs_fallocate_reserve_percent                                110
        zfs_flags                                                      0
        zfs_free_bpobj_enabled                                         1
        zfs_free_leak_on_eio                                           0
        zfs_free_min_time_ms                                        1000
        zfs_history_output_max                                   1048576
        zfs_immediate_write_sz                                     32768
        zfs_initialize_chunk_size                                1048576
        zfs_initialize_value                        16045690984833335022
        zfs_keep_log_spacemaps_at_export                               0
        zfs_key_max_salt_uses                                  400000000
        zfs_livelist_condense_new_alloc                                0
        zfs_livelist_condense_sync_cancel                              0
        zfs_livelist_condense_sync_pause                               0
        zfs_livelist_condense_zthr_cancel                              0
        zfs_livelist_condense_zthr_pause                               0
        zfs_livelist_max_entries                                  500000
        zfs_livelist_min_percent_shared                               75
        zfs_lua_max_instrlimit                                 100000000
        zfs_lua_max_memlimit                                   104857600
        zfs_max_async_dedup_frees                                 100000
        zfs_max_log_walking                                            5
        zfs_max_logsm_summary_length                                  10
        zfs_max_missing_tvds                                           0
        zfs_max_nvlist_src_size                                        0
        zfs_max_recordsize                                       1048576
        zfs_metaslab_find_max_tries                                  100
        zfs_metaslab_fragmentation_threshold                          70
        zfs_metaslab_max_size_cache_sec                             3600
        zfs_metaslab_mem_limit                                        25
        zfs_metaslab_segment_weight_enabled                            1
        zfs_metaslab_switch_threshold                                  2
        zfs_metaslab_try_hard_before_gang                              0
        zfs_mg_fragmentation_threshold                                95
        zfs_mg_noalloc_threshold                                       0
        zfs_min_metaslabs_to_flush                                     1
        zfs_multihost_fail_intervals                                  10
        zfs_multihost_history                                          0
        zfs_multihost_import_intervals                                20
        zfs_multihost_interval                                      1000
        zfs_multilist_num_sublists                                     0
        zfs_no_scrub_io                                                0
        zfs_no_scrub_prefetch                                          0
        zfs_nocacheflush                                               0
        zfs_nopwrite_enabled                                           1
        zfs_object_mutex_size                                         64
        zfs_obsolete_min_time_ms                                     500
        zfs_override_estimate_recordsize                               0
        zfs_pd_bytes_max                                        52428800
        zfs_per_txg_dirty_frees_percent                                5
        zfs_prefetch_disable                                           0
        zfs_read_history                                               0
        zfs_read_history_hits                                          0
        zfs_rebuild_max_segment                                  1048576
        zfs_rebuild_scrub_enabled                                      1
        zfs_rebuild_vdev_limit                                  33554432
        zfs_reconstruct_indirect_combinations_max                   4096
        zfs_recover                                                    0
        zfs_recv_queue_ff                                             20
        zfs_recv_queue_length                                   16777216
        zfs_recv_write_batch_size                                1048576
        zfs_removal_ignore_errors                                      0
        zfs_removal_suspend_progress                                   0
        zfs_remove_max_segment                                  16777216
        zfs_resilver_disable_defer                                     0
        zfs_resilver_min_time_ms                                    3000
        zfs_scan_checkpoint_intval                                  7200
        zfs_scan_fill_weight                                           3
        zfs_scan_ignore_errors                                         0
        zfs_scan_issue_strategy                                        0
        zfs_scan_legacy                                                0
        zfs_scan_max_ext_gap                                     2097152
        zfs_scan_mem_lim_fact                                         20
        zfs_scan_mem_lim_soft_fact                                    20
        zfs_scan_strict_mem_lim                                        0
        zfs_scan_suspend_progress                                      0
        zfs_scan_vdev_limit                                      4194304
        zfs_scrub_min_time_ms                                       1000
        zfs_send_corrupt_data                                          0
        zfs_send_no_prefetch_queue_ff                                 20
        zfs_send_no_prefetch_queue_length                        1048576
        zfs_send_queue_ff                                             20
        zfs_send_queue_length                                   16777216
        zfs_send_unmodified_spill_blocks                               1
        zfs_slow_io_events_per_second                                 20
        zfs_spa_discard_memory_limit                            16777216
        zfs_special_class_metadata_reserve_pct                        25
        zfs_sync_pass_deferred_free                                    2
        zfs_sync_pass_dont_compress                                    8
        zfs_sync_pass_rewrite                                          2
        zfs_sync_taskq_batch_pct                                      75
        zfs_traverse_indirect_prefetch_limit                          32
        zfs_trim_extent_bytes_max                              134217728
        zfs_trim_extent_bytes_min                                  32768
        zfs_trim_metaslab_skip                                         0
        zfs_trim_queue_limit                                          10
        zfs_trim_txg_batch                                            32
        zfs_txg_history                                              100
        zfs_txg_timeout                                                5
        zfs_unflushed_log_block_max                               262144
        zfs_unflushed_log_block_min                                 1000
        zfs_unflushed_log_block_pct                                  400
        zfs_unflushed_max_mem_amt                             1073741824
        zfs_unflushed_max_mem_ppm                                   1000
        zfs_unlink_suspend_progress                                    0
        zfs_user_indirect_is_special                                   1
        zfs_vdev_aggregate_trim                                        0
        zfs_vdev_aggregation_limit                               1048576
        zfs_vdev_aggregation_limit_non_rotating                   131072
        zfs_vdev_async_read_max_active                                 3
        zfs_vdev_async_read_min_active                                 1
        zfs_vdev_async_write_active_max_dirty_percent                 60
        zfs_vdev_async_write_active_min_dirty_percent                 30
        zfs_vdev_async_write_max_active                               10
        zfs_vdev_async_write_min_active                                2
        zfs_vdev_cache_bshift                                         16
        zfs_vdev_cache_max                                         16384
        zfs_vdev_cache_size                                            0
        zfs_vdev_default_ms_count                                    200
        zfs_vdev_default_ms_shift                                     29
        zfs_vdev_initializing_max_active                               1
        zfs_vdev_initializing_min_active                               1
        zfs_vdev_max_active                                         1000
        zfs_vdev_max_auto_ashift                                      16
        zfs_vdev_min_auto_ashift                                       9
        zfs_vdev_min_ms_count                                         16
        zfs_vdev_mirror_non_rotating_inc                               0
        zfs_vdev_mirror_non_rotating_seek_inc                          1
        zfs_vdev_mirror_rotating_inc                                   0
        zfs_vdev_mirror_rotating_seek_inc                              5
        zfs_vdev_mirror_rotating_seek_offset                     1048576
        zfs_vdev_ms_count_limit                                   131072
        zfs_vdev_nia_credit                                            5
        zfs_vdev_nia_delay                                             5
        zfs_vdev_queue_depth_pct                                    1000
        zfs_vdev_raidz_impl cycle [fastest] original scalar sse2 ssse3 avx2
        zfs_vdev_read_gap_limit                                    32768
        zfs_vdev_rebuild_max_active                                    3
        zfs_vdev_rebuild_min_active                                    1
        zfs_vdev_removal_max_active                                    2
        zfs_vdev_removal_min_active                                    1
        zfs_vdev_scheduler                                        unused
        zfs_vdev_scrub_max_active                                      3
        zfs_vdev_scrub_min_active                                      1
        zfs_vdev_sync_read_max_active                                 10
        zfs_vdev_sync_read_min_active                                 10
        zfs_vdev_sync_write_max_active                                10
        zfs_vdev_sync_write_min_active                                10
        zfs_vdev_trim_max_active                                       2
        zfs_vdev_trim_min_active                                       1
        zfs_vdev_write_gap_limit                                    4096
        zfs_vnops_read_chunk_size                                1048576
        zfs_zevent_len_max                                           512
        zfs_zevent_retain_expire_secs                                900
        zfs_zevent_retain_max                                       2000
        zfs_zil_clean_taskq_maxalloc                             1048576
        zfs_zil_clean_taskq_minalloc                                1024
        zfs_zil_clean_taskq_nthr_pct                                 100
        zil_maxblocksize                                          131072
        zil_nocacheflush                                               0
        zil_replay_disable                                             0
        zil_slog_bulk                                             786432
        zio_deadman_log_all                                            0
        zio_dva_throttle_enabled                                       1
        zio_requeue_io_start_cut_in_line                               1
        zio_slow_io_ms                                             30000
        zio_taskq_batch_pct                                           80
        zio_taskq_batch_tpq                                            0
        zvol_inhibit_dev                                               0
        zvol_major                                                   230
        zvol_max_discard_blocks                                    16384
        zvol_prefetch_bytes                                       131072
        zvol_request_sync                                              0
        zvol_threads                                                  32
        zvol_volmode                                                   1

VDEV cache disabled, skipping section

ZIL committed transactions:                                         4.8M
        Commit requests:                                          115.1k
        Flushes to stable storage:                                115.1k
        Transactions to SLOG storage pool:            0 Bytes          0
        Transactions to non-SLOG storage pool:       65.0 GiB     651.9k
rincebrain commented 2 years ago

While I appreciate how much detail you included, you didn't actually include the output of the command I asked for, zpool get all. :)

gmelikov commented 2 years ago

Just a side note, that zfs umount -a won't unmount datasets in all namespaces. Does it reproduce on import pool without mounting datasets (at least with -N)?

gg7 commented 2 years ago

@rincebrain Apologies, I have no idea how I misread zpool get as zfs get.

# zpool get all | grep -v guid
NAME       PROPERTY                       VALUE                          SOURCE
zp-3x12-a  size                           32.7T                          -
zp-3x12-a  capacity                       68%                            -
zp-3x12-a  altroot                        -                              default
zp-3x12-a  health                         ONLINE                         -
zp-3x12-a  version                        -                              default
zp-3x12-a  bootfs                         -                              default
zp-3x12-a  delegation                     on                             default
zp-3x12-a  autoreplace                    off                            default
zp-3x12-a  cachefile                      -                              default
zp-3x12-a  failmode                       wait                           default
zp-3x12-a  listsnapshots                  off                            default
zp-3x12-a  autoexpand                     off                            default
zp-3x12-a  dedupratio                     1.00x                          -
zp-3x12-a  free                           10.4T                          -
zp-3x12-a  allocated                      22.3T                          -
zp-3x12-a  readonly                       off                            -
zp-3x12-a  ashift                         0                              default
zp-3x12-a  comment                        -                              default
zp-3x12-a  expandsize                     -                              -
zp-3x12-a  freeing                        0                              -
zp-3x12-a  fragmentation                  3%                             -
zp-3x12-a  leaked                         0                              -
zp-3x12-a  multihost                      off                            default
zp-3x12-a  checkpoint                     -                              -
zp-3x12-a  autotrim                       off                            default
zp-3x12-a  compatibility                  off                            default
zp-3x12-a  feature@async_destroy          enabled                        local
zp-3x12-a  feature@empty_bpobj            active                         local
zp-3x12-a  feature@lz4_compress           active                         local
zp-3x12-a  feature@multi_vdev_crash_dump  enabled                        local
zp-3x12-a  feature@spacemap_histogram     active                         local
zp-3x12-a  feature@enabled_txg            active                         local
zp-3x12-a  feature@hole_birth             active                         local
zp-3x12-a  feature@extensible_dataset     active                         local
zp-3x12-a  feature@embedded_data          active                         local
zp-3x12-a  feature@bookmarks              enabled                        local
zp-3x12-a  feature@filesystem_limits      enabled                        local
zp-3x12-a  feature@large_blocks           active                         local
zp-3x12-a  feature@large_dnode            enabled                        local
zp-3x12-a  feature@sha512                 enabled                        local
zp-3x12-a  feature@skein                  enabled                        local
zp-3x12-a  feature@edonr                  enabled                        local
zp-3x12-a  feature@userobj_accounting     active                         local
zp-3x12-a  feature@encryption             enabled                        local
zp-3x12-a  feature@project_quota          active                         local
zp-3x12-a  feature@device_removal         enabled                        local
zp-3x12-a  feature@obsolete_counts        enabled                        local
zp-3x12-a  feature@zpool_checkpoint       enabled                        local
zp-3x12-a  feature@spacemap_v2            active                         local
zp-3x12-a  feature@allocation_classes     enabled                        local
zp-3x12-a  feature@resilver_defer         enabled                        local
zp-3x12-a  feature@bookmark_v2            enabled                        local
zp-3x12-a  feature@redaction_bookmarks    enabled                        local
zp-3x12-a  feature@redacted_datasets      enabled                        local
zp-3x12-a  feature@bookmark_written       enabled                        local
zp-3x12-a  feature@log_spacemap           active                         local
zp-3x12-a  feature@livelist               enabled                        local
zp-3x12-a  feature@device_rebuild         enabled                        local
zp-3x12-a  feature@zstd_compress          enabled                        local
zp-3x12-a  feature@draid                  enabled                        local
zp-3x16-a  size                           43.7T                          -
zp-3x16-a  capacity                       63%                            -
zp-3x16-a  altroot                        -                              default
zp-3x16-a  health                         ONLINE                         -
zp-3x16-a  version                        -                              default
zp-3x16-a  bootfs                         -                              default
zp-3x16-a  delegation                     on                             default
zp-3x16-a  autoreplace                    off                            default
zp-3x16-a  cachefile                      -                              default
zp-3x16-a  failmode                       wait                           default
zp-3x16-a  listsnapshots                  off                            default
zp-3x16-a  autoexpand                     off                            default
zp-3x16-a  dedupratio                     1.00x                          -
zp-3x16-a  free                           16.0T                          -
zp-3x16-a  allocated                      27.7T                          -
zp-3x16-a  readonly                       off                            -
zp-3x16-a  ashift                         12                             local
zp-3x16-a  comment                        -                              default
zp-3x16-a  expandsize                     -                              -
zp-3x16-a  freeing                        0                              -
zp-3x16-a  fragmentation                  1%                             -
zp-3x16-a  leaked                         0                              -
zp-3x16-a  multihost                      on                             local
zp-3x16-a  checkpoint                     -                              -
zp-3x16-a  autotrim                       off                            default
zp-3x16-a  compatibility                  off                            default
zp-3x16-a  feature@async_destroy          enabled                        local
zp-3x16-a  feature@empty_bpobj            active                         local
zp-3x16-a  feature@lz4_compress           active                         local
zp-3x16-a  feature@multi_vdev_crash_dump  enabled                        local
zp-3x16-a  feature@spacemap_histogram     active                         local
zp-3x16-a  feature@enabled_txg            active                         local
zp-3x16-a  feature@hole_birth             active                         local
zp-3x16-a  feature@extensible_dataset     active                         local
zp-3x16-a  feature@embedded_data          active                         local
zp-3x16-a  feature@bookmarks              enabled                        local
zp-3x16-a  feature@filesystem_limits      enabled                        local
zp-3x16-a  feature@large_blocks           active                         local
zp-3x16-a  feature@large_dnode            enabled                        local
zp-3x16-a  feature@sha512                 enabled                        local
zp-3x16-a  feature@skein                  enabled                        local
zp-3x16-a  feature@edonr                  enabled                        local
zp-3x16-a  feature@userobj_accounting     active                         local
zp-3x16-a  feature@encryption             active                         local
zp-3x16-a  feature@project_quota          active                         local
zp-3x16-a  feature@device_removal         enabled                        local
zp-3x16-a  feature@obsolete_counts        enabled                        local
zp-3x16-a  feature@zpool_checkpoint       enabled                        local
zp-3x16-a  feature@spacemap_v2            active                         local
zp-3x16-a  feature@allocation_classes     enabled                        local
zp-3x16-a  feature@resilver_defer         enabled                        local
zp-3x16-a  feature@bookmark_v2            enabled                        local
zp-3x16-a  feature@redaction_bookmarks    enabled                        local
zp-3x16-a  feature@redacted_datasets      enabled                        local
zp-3x16-a  feature@bookmark_written       enabled                        local
zp-3x16-a  feature@log_spacemap           active                         local
zp-3x16-a  feature@livelist               enabled                        local
zp-3x16-a  feature@device_rebuild         enabled                        local
zp-3x16-a  feature@zstd_compress          enabled                        local
zp-3x16-a  feature@draid                  enabled                        local

@gmelikov Nothing changes after zpool export zp-3x16-a && zpool import -N zp-3x16-a -- there are still writes in iostat.

Is there anything else I can do to help you guys debug this?

rincebrain commented 2 years ago

zp-3x16-a has multihost protection on, which writes to the pool every N seconds. zp-3x12-a has it off, so it does not.

gg7 commented 2 years ago

Thank you @rincebrain! zpool set multihost=off zp-3x16-a fixed this. IIRC, I enabled it because my initrd kept importing the zpool with the wrong hostid at boot and zpool status kept complaining about that. Lesson learnt :)