openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.55k stars 1.74k forks source link

Hang when running zloop - apparently due to "zio_wait" #6901

Closed prakashsurya closed 6 years ago

prakashsurya commented 6 years ago

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 17.04
Linux Kernel 4.10.0-19-generic
Architecture x86_64
ZFS Version zfsonlinux/zfs@94183a9d8a1133ff0d29666a86f84c24f2c4083c
SPL Version zfsonlinux/spl@ed19bccfb651843fa208232b3a2d3d22a4152bc8

Describe the problem you're observing

When running zloop indefinitely, I'm seeing cases where ztest appears to hang. The first sign of trouble, is zloop will stop outputting new lines; e.g.:

$ sudo ./scripts/zloop.sh -f /var/tmp/zloop -c /var/tmp/zloop
core dump directory (/var/tmp/zloop) does not exist, creating it.
11/22 23:57:33 /export/home/delphix/zfs/bin/ztest -VVVVV -m 2 -r 3 -R 2 -v 3 -a 12 -T 61 -P 15 -s 512m -f /var/tmp/zloop/zloop-run
...
11/23 09:38:46 /export/home/delphix/zfs/bin/ztest -VVVVV -m 1 -r 0 -R 3 -v 4 -a 9 -T 71 -P 10 -s 512m -f /var/tmp/zloop/zloop-run

In this example, zloop ran for nearly 12 hours, kicking off hundreds of ztest invocations in the process (estimating, I didn't count), until this last ztest process hangs (it's been in this state for several days).

There's a single ztest process running on the system right now:

$ date
Mon Nov 27 22:26:06 UTC 2017

$ pgrep ztest | xargs ps -f -p
UID        PID  PPID  C STIME TTY      STAT   TIME CMD
root      2570  3102  0 Nov23 pts/1    S+     0:03 /export/home/delphix/zfs/cmd/ztest/.libs/ztest -VVVVV -m 1 -r 0 -R 3 -v
root     17724  2570  9 Nov23 pts/1    Sl+  629:30 /export/home/delphix/zfs/cmd/ztest/.libs/ztest

$ pstree -p 2570 | head
ztest(2570)---ztest(17724)-+-{ztest}(17725)
                           |-{ztest}(17726)
                           |-{ztest}(17727)
                           |-{ztest}(17728)
                           |-{ztest}(17729)
                           |-{ztest}(17730)
                           |-{ztest}(17731)
                           |-{ztest}(17732)
                           |-{ztest}(17733)
                           |-{ztest}(17734)

I then used gcore to get a core dump of the process:

$ sudo gcore 17724
...
Saved corefile core.17724

And then gdb to inspect the corefile, to see what's going on:

$ sudo gdb /export/home/delphix/zfs/cmd/ztest/.libs/ztest core.17724

The "main" ztest thread appears to be waiting for all other threads to finish:

(gdb) bt
#0  0x00007f83848f79dd in pthread_join (threadid=140202638337792, thread_return=0x0) at pthread_join.c:90
#1  0x000055f212e27074 in ztest_run (zs=0x7f8385b574a0) at ztest.c:6543
#2  0x000055f212e27c9a in main (argc=<optimized out>, argv=0x7ffe42869718) at ztest.c:7092

Using thread apply all where in gdb to inspect the stacks of all threads, most stacks look benign, but I see one in zio_wait:

Thread 575 (Thread 0x7f8378746700 (LWP 18780)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f8384e585e4 in cv_wait (cv=cv@entry=0x7f82d413cbf0, mp=mp@entry=0x7f82d413cbc0) at kernel.c:328
#2  0x00007f8384f97e7b in zio_wait (zio=0x7f82d413c810) at ../../module/zfs/zio.c:1891
#3  0x00007f8384e737c8 in arc_read (pio=pio@entry=0x0, spa=0x55f21512da40, bp=bp@entry=0x7f8378742920, done=0x7f8384e729c0 <arc_getbuf_func>, private=private@entry=0x7f83787427e8, priority=priority@entry=ZIO_PRIORITY_SYNC_READ, zio_flags=25166224, arc_flags=0x7f83787427e4, zb=0x7f83787427f0) at ../../module/zfs/arc.c:6301
#4  0x00007f8384f8bd91 in zil_read_log_block (decrypt=decrypt@entry=B_FALSE, bp=bp@entry=0x7f8378742920, nbp=nbp@entry=0x7f83787429a0, dst=dst@entry=0x7f82d4320000, end=0x7f8378742918, zilog=<optimized out>, zilog=<optimized out>) at ../../module/zfs/zil.c:217
#5  0x00007f8384f8c66c in zil_parse (zilog=zilog@entry=0x7f82d4007810, parse_blk_func=parse_blk_func@entry=0x7f8384f8a700 <zil_free_log_block>, parse_lr_func=parse_lr_func@entry=0x7f8384f8c230 <zil_free_log_record>, arg=arg@entry=0x7f82d403adf0, txg=0, decrypt=decrypt@entry=B_FALSE) at ../../module/zfs/zil.c:374
#6  0x00007f8384f8ccc4 in zil_destroy_sync (zilog=zilog@entry=0x7f82d4007810, tx=tx@entry=0x7f82d403adf0) at ../../module/zfs/zil.c:695
#7  0x00007f8384f8cea3 in zil_destroy (zilog=zilog@entry=0x7f82d4007810, keep_first=keep_first@entry=B_FALSE) at ../../module/zfs/zil.c:684
#8  0x00007f8384f8f99e in zil_suspend (osname=osname@entry=0x7f82d40008f0 "ztest/temp_10", cookiep=cookiep@entry=0x0) at ../../module/zfs/zil.c:2133
#9  0x00007f8384f8fd5a in zil_vdev_offline (osname=osname@entry=0x7f82d40008f0 "ztest/temp_10", arg=arg@entry=0x0) at ../../module/zfs/zil.c:2350
#10 0x00007f8384e90740 in dmu_objset_find_impl (spa=spa@entry=0x55f21512da40, name=name@entry=0x7f82d40008f0 "ztest/temp_10", func=func@entry=0x7f8384f8fd40 <zil_vdev_offline>, arg=arg@entry=0x0, flags=flags@entry=2) at ../../module/zfs/dmu_objset.c:2617
#11 0x00007f8384e90806 in dmu_objset_find_impl (spa=0x55f21512da40, name=name@entry=0x55f21512da40 "ztest", func=func@entry=0x7f8384f8fd40 <zil_vdev_offline>, arg=arg@entry=0x0, flags=flags@entry=2) at ../../module/zfs/dmu_objset.c:2560
#12 0x00007f8384e94e90 in dmu_objset_find (name=0x55f21512da40 "ztest", func=0x7f8384f8fd40 <zil_vdev_offline>, arg=arg@entry=0x0, flags=flags@entry=2) at ../../module/zfs/dmu_objset.c:2633
#13 0x00007f8384ef4c22 in spa_offline_log (spa=spa@entry=0x55f21512da40) at ../../module/zfs/spa.c:1941
#14 0x00007f8384f10cfe in vdev_offline_locked (spa=spa@entry=0x55f21512da40, guid=guid@entry=6459914980364466541, flags=flags@entry=0) at ../../module/zfs/vdev.c:2700
#15 0x00007f8384f10e8f in vdev_offline (spa=0x55f21512da40, guid=6459914980364466541, flags=0) at ../../module/zfs/vdev.c:2753
#16 0x000055f212e2f427 in ztest_fault_inject (zd=<optimized out>, id=<optimized out>) at ztest.c:5521
#17 0x000055f212e2877c in ztest_execute (test=<optimized out>, zi=0x55f213040340 <ztest_info+512>, id=2) at ztest.c:6237
#18 0x000055f212e2ee03 in ztest_thread (arg=0x2) at ztest.c:6284
#19 0x00007f83848f66da in start_thread (arg=0x7f8378746700) at pthread_create.c:456
#20 0x00007f8384630d7f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105

I'm attaching a file containing the stacks of all threads from the corefile, gdb.txt, as well as the corefile itself (compressed with zip), core.17724.zip.

prakashsurya commented 6 years ago

Last week I suspected this issue might have been due to zfsonlinux/zfs@d4a72f23863382bdf6d0ae33196f5b5decbc48fd (I no longer suspect this), so I also kicked off a long running zloop process on another system with that commit reverted (but still based off of zfsonlinux/zfs@94183a9d8a1133ff0d29666a86f84c24f2c4083c). With the sequential scrub code reverted, I was still able to trigger a hang.

$ date
Mon Nov 27 22:57:57 UTC 2017

$ pgrep ztest | xargs ps -f -p
UID        PID  PPID  C STIME TTY      STAT   TIME CMD
root      8516 27305  0 Nov23 pts/1    Sl+    2:40 /export/home/delphix/zfs/cmd/ztest/.libs/ztest
root     27305  2063  0 Nov23 pts/1    S+     0:03 /export/home/delphix/zfs/cmd/ztest/.libs/ztest -VVVVV -m 2 -r 0 -R 1 -v 2 -a 9 -T 92 -P 39 -s 512m -f /var/tmp/zloop/zloop-run

$ pstree -p 27305 | head
ztest(27305)---ztest(8516)-+-{ztest}(8517)
                           |-{ztest}(8518)
                           |-{ztest}(8519)
                           |-{ztest}(8520)
                           |-{ztest}(8521)
                           |-{ztest}(8522)
                           |-{ztest}(8523)
                           |-{ztest}(8524)
                           |-{ztest}(8525)
                           |-{ztest}(8526)

Again, I used gcore and gdb to dump the stacks of all threads. Here's the corefile (compressed with zip), core.8516.zip, and here's a file containing all stacks, gdb.txt.

In this example, there's also a single thread in zio_wait, only this time it's the sync thread:

Thread 565 (Thread 0x7f4ea06d5700 (LWP 9525)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f4eaca99e34 in cv_wait (cv=cv@entry=0x7f4df004a3f0, mp=mp@entry=0x7f4df004a3c0) at kernel.c:328
#2  0x00007f4eacbd523b in zio_wait (zio=zio@entry=0x7f4df004a010) at ../../module/zfs/zio.c:1889
#3  0x00007f4eacb0a36c in dsl_pool_sync (dp=dp@entry=0x5617ae500b30, txg=txg@entry=368) at ../../module/zfs/dsl_pool.c:558
#4  0x00007f4eacb335e6 in spa_sync (spa=spa@entry=0x5617ae4fad70, txg=txg@entry=368) at ../../module/zfs/spa.c:6868
#5  0x00007f4eacb454e7 in txg_sync_thread (arg=0x5617ae500b30) at ../../module/zfs/txg.c:547
#6  0x00007f4eac5386da in start_thread (arg=0x7f4ea06d5700) at pthread_create.c:456
#7  0x00007f4eac272d7f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105

With that said, the stacks in this corefile look more interesting than the other one (e.g. one thread in spa_open_common, another in spa_keystore_dsl_key_rele, another in arc_read via dsl_dir_get_encryption_root_ddobj, etc.), so these could be two separate issues, both resulting in a ztest "hang". For now, I'm lumping both into the same issue; we can split them up later, if they're determined to be different.

prakashsurya commented 6 years ago

Lastly, I also tried running zloop against my code for #6566 (also based on zfsonlinux/zfs@94183a9d8a1133ff0d29666a86f84c24f2c4083c) and also hit a ztest hang:

$ date
Mon Nov 27 23:24:24 UTC 2017

$ pgrep ztest | xargs ps -f -p
UID        PID  PPID  C STIME TTY      STAT   TIME CMD
root     14427 26148  0 Nov24 pts/1    S+     0:01 /export/home/delphix/zfs/cmd/ztest/.libs/ztest -VVVVV -m 2 -r 0 -R 1 -v 2 -a 12 -T 66 -P 19 -s 512m -f /var/tmp/zloop/zloop-run
root     22145 14427  0 Nov24 pts/1    Sl+    2:23 /export/home/delphix/zfs/cmd/ztest/.libs/ztest

$ pstree -p 14427 | head
ztest(14427)---ztest(22145)-+-{ztest}(22146)
                            |-{ztest}(22147)
                            |-{ztest}(22148)
                            |-{ztest}(22149)
                            |-{ztest}(22150)
                            |-{ztest}(22151)
                            |-{ztest}(22152)
                            |-{ztest}(22153)
                            |-{ztest}(22154)
                            |-{ztest}(22155)

In this case, there's a lot of threads waiting in spa_open_common (probably waiting for spa_namespace_lock) and the sync thread has the following stack:

Thread 565 (Thread 0x7f30350e1700 (LWP 23159)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f304162c724 in cv_wait (cv=cv@entry=0x55aba21ef6d0, mp=mp@entry=0x55aba21ef690) at kernel.c:328
#2  0x00007f30416a286b in scan_exec_io (dp=dp@entry=0x55aba225f1c0, bp=bp@entry=0x7f30350ddbb0, zio_flags=25166000, zio_flags@entry=32560, zb=0x7f30350ddb90, zb@entry=0x18000b0350ddb90, queue=queue@entry=0x0)
    at ../../module/zfs/dsl_scan.c:3566
#3  0x00007f30416a4dfd in dsl_scan_enqueue (dp=dp@entry=0x55aba225f1c0, bp=bp@entry=0x7f30350ddbb0, zio_flags=32560, zio_flags@entry=25166000, zb=0x18000b0350ddb90, zb@entry=0x7f30350ddb90) at ../../module/zfs/dsl_scan.c:3436
#4  0x00007f30416a50f2 in dsl_scan_scrub_cb (dp=0x55aba225f1c0, bp=0x7f30350ddbb0, zb=0x7f30350ddb90) at ../../module/zfs/dsl_scan.c:3509
#5  0x00007f30416a82e1 in dsl_scan_ddt_entry (scn=0x55aba22315c0, checksum=<optimized out>, dde=dde@entry=0x7f2f4400fe10, tx=tx@entry=0x9f82231a8ac43995) at ../../module/zfs/dsl_scan.c:2299
#6  0x00007f304165bada in ddt_sync_entry (ddt=ddt@entry=0x55aba22e9710, dde=dde@entry=0x7f2f4400fe10, tx=0x9f82231a8ac43995, tx@entry=0x7f3030010010, txg=15078613438550168504, txg@entry=190) at ../../module/zfs/ddt.c:1122
#7  0x00007f304165bd19 in ddt_sync_table (ddt=ddt@entry=0x55aba22e9710, tx=tx@entry=0x7f3030010010, txg=txg@entry=190) at ../../module/zfs/ddt.c:1147
#8  0x00007f304165ca16 in ddt_sync (spa=spa@entry=0x55aba21eed20, txg=txg@entry=190) at ../../module/zfs/ddt.c:1199
#9  0x00007f30416ca307 in spa_sync (spa=spa@entry=0x55aba21eed20, txg=txg@entry=190) at ../../module/zfs/spa.c:6882
#10 0x00007f30416dc1f7 in txg_sync_thread (arg=0x55aba225f1c0) at ../../module/zfs/txg.c:547
#11 0x00007f30410ca6da in start_thread (arg=0x7f30350e1700) at pthread_create.c:456
#12 0x00007f3040e04d7f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105

Again, I used gcore and gdb to dump the stacks of all threads. Here's the corefile (compressed with zip), core.22145.zip, and here's a file containing all stacks, gdb.txt.

tcaputi commented 6 years ago

I took a look at the gdb.txt files. The one from Mon Nov 27 22:57:57 UTC 2017 was particularly interesting to me because it was stuck in the encryption code. The interesting thing here is that this one doesn't seem to be caused by the same problem as the others. This one seems to be the result of this stack trace:

Thread 588 (Thread 0x7f4e863d0700 (LWP 10220)):
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f4eac53af0d in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7f4eacf31650 <buf_hash_table+166000>) at ../nptl/pthread_mutex_lock.c:80
#2  0x00007f4eaca9991c in mutex_enter (mp=mp@entry=0x7f4eacf31650 <buf_hash_table+166000>) at kernel.c:207
#3  0x00007f4eacaa6b16 in buf_hash_find (spa=spa@entry=1711801064303451485, bp=bp@entry=0x7f4dd0003240, lockp=lockp@entry=0x7f4e863cc548) at ../../module/zfs/arc.c:1086
#4  0x00007f4eacab476a in arc_read (pio=pio@entry=0x7f4e7c00a610, spa=0x5617ae4fad70, bp=0x7f4dd0003240, done=done@entry=0x7f4eacac0750 <dbuf_read_done>, private=private@entry=0x7f4e7c0029b0, priority=priority@entry=ZIO_PRIORITY_SYNC_READ, 
    zio_flags=128, arc_flags=0x7f4e863cc5dc, zb=0x7f4e863cc5e0) at ../../module/zfs/arc.c:5881
#5  0x00007f4eacabf376 in dbuf_read_impl (db=db@entry=0x7f4e7c0029b0, zio=zio@entry=0x7f4e7c00a610, flags=flags@entry=10) at ../../module/zfs/dbuf.c:1134
#6  0x00007f4eacabfa73 in dbuf_read (db=db@entry=0x7f4e7c0029b0, zio=0x7f4e7c00a610, zio@entry=0x0, flags=flags@entry=10) at ../../module/zfs/dbuf.c:1276
#7  0x00007f4eacacbd40 in dmu_buf_hold (os=<optimized out>, object=<optimized out>, offset=offset@entry=0, tag=tag@entry=0x7f4eacc4c110 <__func__.14426>, dbp=dbp@entry=0x7f4e863cc6f0, flags=flags@entry=1) at ../../module/zfs/dmu.c:233
#8  0x00007f4eacbc1e08 in zap_lockdir (os=<optimized out>, obj=<optimized out>, tx=tx@entry=0x0, lti=lti@entry=0, fatreader=fatreader@entry=B_TRUE, adding=adding@entry=B_FALSE, tag=0x7f4eacc4c110 <__func__.14426>, zapp=0x7f4e863cc760)
    at ../../module/zfs/zap_micro.c:586
#9  0x00007f4eacbc26e5 in zap_lookup_norm (os=<optimized out>, zapobj=<optimized out>, name=name@entry=0x7f4eacc32615 "DSL_CRYPTO_ROOT_DDOBJ", integer_size=integer_size@entry=8, num_integers=num_integers@entry=1, 
    buf=buf@entry=0x7f4e863cc800, mt=(unknown: 0), realname=0x0, rn_len=0, ncp=0x0) at ../../module/zfs/zap_micro.c:956
#10 0x00007f4eacbc2780 in zap_lookup (os=<optimized out>, zapobj=<optimized out>, name=name@entry=0x7f4eacc32615 "DSL_CRYPTO_ROOT_DDOBJ", integer_size=integer_size@entry=8, num_integers=num_integers@entry=1, buf=buf@entry=0x7f4e863cc800)
    at ../../module/zfs/zap_micro.c:901
#11 0x00007f4eacb03ffd in dsl_dir_get_encryption_root_ddobj (dd=dd@entry=0x7f4e7c0025b0, rddobj=rddobj@entry=0x7f4e863cc800) at ../../module/zfs/dsl_crypt.c:356
#12 0x00007f4eacb040ae in spa_keystore_wkey_hold_dd (spa=spa@entry=0x5617ae4fad70, dd=dd@entry=0x7f4e7c0025b0, tag=tag@entry=0x7f4eacc33ed0 <__func__.16399>, wkey_out=wkey_out@entry=0x7f4e863cc860) at ../../module/zfs/dsl_crypt.c:406
#13 0x00007f4eacb041c2 in spa_keystore_dsl_key_hold_dd (spa=spa@entry=0x5617ae4fad70, dd=dd@entry=0x7f4e7c0025b0, tag=tag@entry=0x7f4e7c002d60, dck_out=dck_out@entry=0x7f4e7c002e08) at ../../module/zfs/dsl_crypt.c:642
#14 0x00007f4eacb0632c in spa_keystore_create_mapping_impl (spa=0x5617ae4fad70, dsobj=174, dd=0x7f4e7c0025b0, tag=tag@entry=0x7f4e7c000fd0) at ../../module/zfs/dsl_crypt.c:917
#15 0x00007f4eacb06459 in spa_keystore_create_mapping (spa=<optimized out>, ds=ds@entry=0x7f4e7c000fd0, tag=tag@entry=0x7f4e7c000fd0) at ../../module/zfs/dsl_crypt.c:965
#16 0x00007f4eacaf0a92 in dsl_dataset_hold_obj_flags (dp=dp@entry=0x5617ae500b30, dsobj=<optimized out>, flags=flags@entry=DS_HOLD_FLAG_DECRYPT, tag=tag@entry=0x5617ad6bef00 <__func__.21632>, dsp=dsp@entry=0x7f4e863cca28)
    at ../../module/zfs/dsl_dataset.c:552
#17 0x00007f4eacaf14f2 in dsl_dataset_hold_flags (dp=0x5617ae500b30, name=name@entry=0x7f4e863cccb0 "ztest/temp_20", flags=flags@entry=DS_HOLD_FLAG_DECRYPT, tag=tag@entry=0x5617ad6bef00 <__func__.21632>, dsp=dsp@entry=0x7f4e863ccad0)
    at ../../module/zfs/dsl_dataset.c:586
#18 0x00007f4eacaf3207 in dsl_dataset_own (dp=<optimized out>, name=name@entry=0x7f4e863cccb0 "ztest/temp_20", flags=flags@entry=DS_HOLD_FLAG_DECRYPT, tag=tag@entry=0x5617ad6bef00 <__func__.21632>, dsp=dsp@entry=0x7f4e863ccad0)
    at ../../module/zfs/dsl_dataset.c:649
#19 0x00007f4eacad5e47 in dmu_objset_own (name=0x7f4e863cccb0 "ztest/temp_20", type=DMU_OST_OTHER, readonly=B_FALSE, decrypt=B_TRUE, tag=0x5617ad6bef00 <__func__.21632>, osp=0x7f4e863ccca0) at ../../module/zfs/dmu_objset.c:698
#20 0x00005617ad6aeb4e in ztest_dmu_objset_own (name=0x7f4e863cccb0 "ztest/temp_20", type=DMU_OST_OTHER, readonly=B_FALSE, tag=0x5617ad6bef00 <__func__.21632>, osp=0x7f4e863ccca0, decrypt=B_TRUE) at ztest.c:1193
#21 0x00005617ad6b8481 in ztest_dmu_objset_create_destroy (zd=<optimized out>, id=20) at ztest.c:3740
#22 0x00005617ad6ac77c in ztest_execute (test=<optimized out>, zi=0x5617ad8c4280 <ztest_info+320>, id=20) at ztest.c:6237
#23 0x00005617ad6b2e03 in ztest_thread (arg=0x14) at ztest.c:6284
#24 0x00007f4eac5386da in start_thread (arg=0x7f4e863d0700) at pthread_create.c:456
#25 0x00007f4eac272d7f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105

This seems to indicate that the arc header lock was leaked.....

The other problems seem to be to be caused by the spa_namespace_lock (as @prakashsurya already mentioned).

dweeezil commented 6 years ago

Is there a chance this is caused by #7017?

behlendorf commented 6 years ago

All three ztest failures here have been addressed in master by the following patches.

f90a30a Fix txg_sync_thread hang in scan_exec_io() 0d23f5e Fix hash_lock / keystore.sk_dk_lock lock inversion 0735ecb OpenZFS 8997 - ztest assertion failure in zil_lwb_write_issue 2fe61a7 OpenZFS 8909 - 8585 can cause a use-after-free kernel panic

In addition, the deadman code was enhanced to more reliably detect hangs, 8fb1ede146426e2217b35a254b4660e264854fa1, and zloop.sh was better integrated with the buildbot cd0a89ded96d9cedbf20c70b533b616770ca4831 to log the useful information from the core dump for analysis.