openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.57k stars 1.74k forks source link

zfs crashes when changing scheduler to bfq-sq when zfs_prefetch_disable=0 #6843

Closed mabod closed 4 years ago

mabod commented 6 years ago

System information

Distribution Name | Manjaro Distribution Version | rolling Linux Kernel | 4.13.11 (not happening with 4.9.60) Architecture | amd64 ZFS Version | 0.7.2-1 SPL Version | 0.7.2-1

Describe the problem you're observing

When kernel is 4.13.11 and zfs_prefetch_disable=0 the zfs module crashes with core dump in the journal when switching the scheduler to bfq-sq.

Describe how to reproduce the problem

I am doing performance tests for different schedulers. I am using the following script to switch the schedulers and exceute fio for the benchmark:

FILE=/mnt/zstore/BENCHMARK/fiotest.$$
OUT=/mnt/zstore/BENCHMARK/zfs-compare-scheduler-prefetch-disable-1-arc-limit-4.out
SCHEDULER="noop deadline cfq bfq-sq"

for s in $SCHEDULER;
do
    echo "scheduler: $s" 
    echo "#############" >> $OUT
    echo "scheduler: $s" >> $OUT
    for d in /sys/block/sda/queue/scheduler /sys/block/sdb/queue/scheduler /sys/block/sdc/queue/scheduler; 
    do 
        echo "$s" > $d; 
    done 
    fio --rw=rw --name=$FILE --size=20G -bs=1M  --refill_buffers --ioengine=libaio --gtod_reduce=1 >> $OUT
    echo "#############" >> $OUT
done

When the script reaches scheduler bfq-sq the fio process hangs. It sits for about a minute before the following messages show up in the journal:

Nov 07 20:27:16 rakete kernel: INFO: task z_wr_iss:1445 blocked for more than 120 seconds.
Nov 07 20:27:16 rakete kernel:       Tainted: P           O    4.13.11-1-MANJARO #1
Nov 07 20:27:16 rakete kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 07 20:27:16 rakete kernel: z_wr_iss        D    0  1445      2 0x00000000
Nov 07 20:27:16 rakete kernel: Call Trace:
Nov 07 20:27:16 rakete kernel:  __schedule+0x239/0x890
Nov 07 20:27:16 rakete kernel:  schedule+0x3d/0x90
Nov 07 20:27:16 rakete kernel:  io_schedule+0x16/0x40
Nov 07 20:27:16 rakete kernel:  wbt_wait+0x178/0x350
Nov 07 20:27:16 rakete kernel:  ? wait_woken+0x80/0x80
Nov 07 20:27:16 rakete kernel:  blk_queue_bio+0x101/0x410
Nov 07 20:27:16 rakete kernel:  generic_make_request+0x125/0x320
Nov 07 20:27:16 rakete kernel:  submit_bio+0x73/0x150
Nov 07 20:27:16 rakete kernel:  ? submit_bio+0x73/0x150
Nov 07 20:27:16 rakete kernel:  vdev_disk_io_start+0x4b2/0x6f0 [zfs]
Nov 07 20:27:16 rakete kernel:  zio_vdev_io_start+0xa2/0x340 [zfs]
Nov 07 20:27:16 rakete kernel:  ? zio_vdev_io_start+0xa2/0x340 [zfs]
Nov 07 20:27:16 rakete kernel:  zio_nowait+0xab/0x160 [zfs]
Nov 07 20:27:16 rakete kernel:  vdev_raidz_io_start+0x1de/0x300 [zfs]
Nov 07 20:27:16 rakete kernel:  zio_vdev_io_start+0xa2/0x340 [zfs]
Nov 07 20:27:16 rakete kernel:  zio_nowait+0xab/0x160 [zfs]
Nov 07 20:27:16 rakete kernel:  ? vdev_config_sync+0x180/0x180 [zfs]
Nov 07 20:27:16 rakete kernel:  vdev_mirror_io_start+0x94/0x180 [zfs]
Nov 07 20:27:16 rakete kernel:  zio_vdev_io_start+0x1aa/0x340 [zfs]
Nov 07 20:27:16 rakete kernel:  ? taskq_member+0x18/0x30 [spl]
Nov 07 20:27:16 rakete kernel:  zio_execute+0x8a/0xe0 [zfs]
Nov 07 20:27:16 rakete kernel:  taskq_thread+0x254/0x480 [spl]
Nov 07 20:27:16 rakete kernel:  ? wake_up_q+0x80/0x80
Nov 07 20:27:16 rakete kernel:  kthread+0x125/0x140
Nov 07 20:27:16 rakete kernel:  ? taskq_thread_should_stop+0x70/0x70 [spl]
Nov 07 20:27:16 rakete kernel:  ? kthread_create_on_node+0x70/0x70
Nov 07 20:27:16 rakete kernel:  ret_from_fork+0x25/0x30
Nov 07 20:27:16 rakete kernel: INFO: task txg_sync:1966 blocked for more than 120 seconds.
Nov 07 20:27:16 rakete kernel:       Tainted: P           O    4.13.11-1-MANJARO #1
Nov 07 20:27:16 rakete kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 07 20:27:16 rakete kernel: txg_sync        D    0  1966      2 0x00000000
Nov 07 20:27:16 rakete kernel: Call Trace:
Nov 07 20:27:16 rakete kernel:  __schedule+0x239/0x890
Nov 07 20:27:16 rakete kernel:  schedule+0x3d/0x90
Nov 07 20:27:16 rakete kernel:  io_schedule+0x16/0x40
Nov 07 20:27:16 rakete kernel:  cv_wait_common+0xb0/0x130 [spl]
Nov 07 20:27:16 rakete kernel:  ? wait_woken+0x80/0x80
Nov 07 20:27:16 rakete kernel:  __cv_wait_io+0x18/0x20 [spl]
Nov 07 20:27:16 rakete kernel:  zio_wait+0xf2/0x1b0 [zfs]
Nov 07 20:27:16 rakete kernel:  dsl_pool_sync+0xb8/0x430 [zfs]
Nov 07 20:27:16 rakete kernel:  spa_sync+0x43f/0xd80 [zfs]
Nov 07 20:27:16 rakete kernel:  txg_sync_thread+0x2d2/0x4a0 [zfs]
Nov 07 20:27:16 rakete kernel:  ? finish_task_switch+0x75/0x200
Nov 07 20:27:16 rakete kernel:  ? txg_delay+0x1b0/0x1b0 [zfs]
Nov 07 20:27:16 rakete kernel:  thread_generic_wrapper+0x72/0x80 [spl]
Nov 07 20:27:16 rakete kernel:  kthread+0x125/0x140
Nov 07 20:27:16 rakete kernel:  ? __thread_exit+0x20/0x20 [spl]
Nov 07 20:27:16 rakete kernel:  ? kthread_create_on_node+0x70/0x70
Nov 07 20:27:16 rakete kernel:  ? kthread_create_on_node+0x70/0x70
Nov 07 20:27:16 rakete kernel:  ret_from_fork+0x25/0x30
Nov 07 20:27:16 rakete kernel: INFO: task fio:24704 blocked for more than 120 seconds.
Nov 07 20:27:16 rakete kernel:       Tainted: P           O    4.13.11-1-MANJARO #1
Nov 07 20:27:16 rakete kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 07 20:27:16 rakete kernel: fio             D    0 24704  24684 0x00000000
Nov 07 20:27:16 rakete kernel: Call Trace:
Nov 07 20:27:16 rakete kernel:  __schedule+0x239/0x890
Nov 07 20:27:16 rakete kernel:  schedule+0x3d/0x90
Nov 07 20:27:16 rakete kernel:  cv_wait_common+0x11c/0x130 [spl]
Nov 07 20:27:16 rakete kernel:  ? wait_woken+0x80/0x80
Nov 07 20:27:16 rakete kernel:  __cv_wait+0x15/0x20 [spl]
Nov 07 20:27:16 rakete kernel:  txg_wait_open+0xb0/0x100 [zfs]
Nov 07 20:27:16 rakete kernel:  dmu_tx_wait+0x37c/0x390 [zfs]
Nov 07 20:27:16 rakete kernel:  dmu_tx_assign+0x8f/0x480 [zfs]
Nov 07 20:27:16 rakete kernel:  zfs_write+0x403/0xd30 [zfs]
Nov 07 20:27:16 rakete kernel:  ? zio_destroy+0xc3/0xd0 [zfs]
Nov 07 20:27:16 rakete kernel:  ? zio_wait+0x139/0x1b0 [zfs]
Nov 07 20:27:16 rakete kernel:  ? spl_kmem_free+0x33/0x40 [spl]
Nov 07 20:27:16 rakete kernel:  ? spl_kmem_free+0x33/0x40 [spl]
Nov 07 20:27:16 rakete kernel:  ? zfs_range_unlock+0x1ec/0x2c0 [zfs]
Nov 07 20:27:16 rakete kernel:  zpl_write_common_iovec+0x8c/0xe0 [zfs]
Nov 07 20:27:16 rakete kernel:  zpl_iter_write+0xae/0xe0 [zfs]
Nov 07 20:27:16 rakete kernel:  aio_write+0x113/0x170
Nov 07 20:27:16 rakete kernel:  ? __check_object_size+0xaf/0x1b0
Nov 07 20:27:16 rakete kernel:  ? _copy_to_user+0x2a/0x40
Nov 07 20:27:16 rakete kernel:  ? aio_read_events+0x245/0x390
Nov 07 20:27:16 rakete kernel:  do_io_submit+0x387/0x730
Nov 07 20:27:16 rakete kernel:  ? do_io_submit+0x387/0x730
Nov 07 20:27:16 rakete kernel:  SyS_io_submit+0x10/0x20
Nov 07 20:27:16 rakete kernel:  ? SyS_io_submit+0x10/0x20
Nov 07 20:27:16 rakete kernel:  entry_SYSCALL_64_fastpath+0x1a/0xa5
Nov 07 20:27:16 rakete kernel: RIP: 0033:0x7fd2010f46c7
Nov 07 20:27:16 rakete kernel: RSP: 002b:00007ffc6e157498 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
Nov 07 20:27:16 rakete kernel: RAX: ffffffffffffffda RBX: 00007fd1e4855000 RCX: 00007fd2010f46c7
Nov 07 20:27:16 rakete kernel: RDX: 000055560c8417a0 RSI: 0000000000000001 RDI: 00007fd214bcd000
Nov 07 20:27:16 rakete kernel: RBP: 000055560c8415c0 R08: 0000000000000001 R09: 000055560c841660
Nov 07 20:27:16 rakete kernel: R10: 000000005e2e0000 R11: 0000000000000246 R12: 00007fd1fb939510
Nov 07 20:27:16 rakete kernel: R13: 0000000000000001 R14: 0000000000100000 R15: 0000000000000000
Nov 07 20:29:19 rakete kernel: INFO: task z_wr_iss:1445 blocked for more than 120 seconds.
Nov 07 20:29:19 rakete kernel:       Tainted: P           O    4.13.11-1-MANJARO #1
Nov 07 20:29:19 rakete kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 07 20:29:19 rakete kernel: z_wr_iss        D    0  1445      2 0x00000000
Nov 07 20:29:19 rakete kernel: Call Trace:
Nov 07 20:29:19 rakete kernel:  __schedule+0x239/0x890
Nov 07 20:29:19 rakete kernel:  schedule+0x3d/0x90
Nov 07 20:29:19 rakete kernel:  io_schedule+0x16/0x40
Nov 07 20:29:19 rakete kernel:  wbt_wait+0x178/0x350
Nov 07 20:29:19 rakete kernel:  ? wait_woken+0x80/0x80
Nov 07 20:29:19 rakete kernel:  blk_queue_bio+0x101/0x410
Nov 07 20:29:19 rakete kernel:  generic_make_request+0x125/0x320
Nov 07 20:29:19 rakete kernel:  submit_bio+0x73/0x150
Nov 07 20:29:19 rakete kernel:  ? submit_bio+0x73/0x150
Nov 07 20:29:19 rakete kernel:  vdev_disk_io_start+0x4b2/0x6f0 [zfs]
Nov 07 20:29:19 rakete kernel:  zio_vdev_io_start+0xa2/0x340 [zfs]
Nov 07 20:29:19 rakete kernel:  ? zio_vdev_io_start+0xa2/0x340 [zfs]
Nov 07 20:29:19 rakete kernel:  zio_nowait+0xab/0x160 [zfs]
Nov 07 20:29:19 rakete kernel:  vdev_raidz_io_start+0x1de/0x300 [zfs]
Nov 07 20:29:19 rakete kernel:  zio_vdev_io_start+0xa2/0x340 [zfs]
Nov 07 20:29:19 rakete kernel:  zio_nowait+0xab/0x160 [zfs]
Nov 07 20:29:19 rakete kernel:  ? vdev_config_sync+0x180/0x180 [zfs]
Nov 07 20:29:19 rakete kernel:  vdev_mirror_io_start+0x94/0x180 [zfs]
Nov 07 20:29:19 rakete kernel:  zio_vdev_io_start+0x1aa/0x340 [zfs]
Nov 07 20:29:19 rakete kernel:  ? taskq_member+0x18/0x30 [spl]
Nov 07 20:29:19 rakete kernel:  zio_execute+0x8a/0xe0 [zfs]
Nov 07 20:29:19 rakete kernel:  taskq_thread+0x254/0x480 [spl]
Nov 07 20:29:19 rakete kernel:  ? wake_up_q+0x80/0x80
Nov 07 20:29:19 rakete kernel:  kthread+0x125/0x140
Nov 07 20:29:19 rakete kernel:  ? taskq_thread_should_stop+0x70/0x70 [spl]
Nov 07 20:29:19 rakete kernel:  ? kthread_create_on_node+0x70/0x70
Nov 07 20:29:19 rakete kernel:  ret_from_fork+0x25/0x30
Nov 07 20:29:19 rakete kernel: INFO: task txg_sync:1966 blocked for more than 120 seconds.
Nov 07 20:29:19 rakete kernel:       Tainted: P           O    4.13.11-1-MANJARO #1
Nov 07 20:29:19 rakete kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 07 20:29:19 rakete kernel: txg_sync        D    0  1966      2 0x00000000
Nov 07 20:29:19 rakete kernel: Call Trace:
Nov 07 20:29:19 rakete kernel:  __schedule+0x239/0x890
Nov 07 20:29:19 rakete kernel:  schedule+0x3d/0x90
Nov 07 20:29:19 rakete kernel:  io_schedule+0x16/0x40
Nov 07 20:29:19 rakete kernel:  cv_wait_common+0xb0/0x130 [spl]
Nov 07 20:29:19 rakete kernel:  ? wait_woken+0x80/0x80
Nov 07 20:29:19 rakete kernel:  __cv_wait_io+0x18/0x20 [spl]
Nov 07 20:29:19 rakete kernel:  zio_wait+0xf2/0x1b0 [zfs]
Nov 07 20:29:19 rakete kernel:  dsl_pool_sync+0xb8/0x430 [zfs]
Nov 07 20:29:19 rakete kernel:  spa_sync+0x43f/0xd80 [zfs]
Nov 07 20:29:19 rakete kernel:  txg_sync_thread+0x2d2/0x4a0 [zfs]
Nov 07 20:29:19 rakete kernel:  ? finish_task_switch+0x75/0x200
Nov 07 20:29:19 rakete kernel:  ? txg_delay+0x1b0/0x1b0 [zfs]
Nov 07 20:29:19 rakete kernel:  thread_generic_wrapper+0x72/0x80 [spl]
Nov 07 20:29:19 rakete kernel:  kthread+0x125/0x140
Nov 07 20:29:19 rakete kernel:  ? __thread_exit+0x20/0x20 [spl]
Nov 07 20:29:19 rakete kernel:  ? kthread_create_on_node+0x70/0x70
Nov 07 20:29:19 rakete kernel:  ? kthread_create_on_node+0x70/0x70
Nov 07 20:29:19 rakete kernel:  ret_from_fork+0x25/0x30
Nov 07 20:29:19 rakete kernel: INFO: task fio:24704 blocked for more than 120 seconds.
Nov 07 20:29:19 rakete kernel:       Tainted: P           O    4.13.11-1-MANJARO #1
Nov 07 20:29:19 rakete kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 07 20:29:19 rakete kernel: fio             D    0 24704  24684 0x00000000
Nov 07 20:29:19 rakete kernel: Call Trace:
Nov 07 20:29:19 rakete kernel:  __schedule+0x239/0x890
Nov 07 20:29:19 rakete kernel:  schedule+0x3d/0x90
Nov 07 20:29:19 rakete kernel:  cv_wait_common+0x11c/0x130 [spl]
Nov 07 20:29:19 rakete kernel:  ? wait_woken+0x80/0x80
Nov 07 20:29:19 rakete kernel:  __cv_wait+0x15/0x20 [spl]
Nov 07 20:29:19 rakete kernel:  txg_wait_open+0xb0/0x100 [zfs]
Nov 07 20:29:19 rakete kernel:  dmu_tx_wait+0x37c/0x390 [zfs]
Nov 07 20:29:19 rakete kernel:  dmu_tx_assign+0x8f/0x480 [zfs]
Nov 07 20:29:19 rakete kernel:  zfs_write+0x403/0xd30 [zfs]
Nov 07 20:29:19 rakete kernel:  ? zio_destroy+0xc3/0xd0 [zfs]
Nov 07 20:29:19 rakete kernel:  ? zio_wait+0x139/0x1b0 [zfs]
Nov 07 20:29:19 rakete kernel:  ? spl_kmem_free+0x33/0x40 [spl]
Nov 07 20:29:19 rakete kernel:  ? spl_kmem_free+0x33/0x40 [spl]
Nov 07 20:29:19 rakete kernel:  ? zfs_range_unlock+0x1ec/0x2c0 [zfs]
Nov 07 20:29:19 rakete kernel:  zpl_write_common_iovec+0x8c/0xe0 [zfs]
Nov 07 20:29:19 rakete kernel:  zpl_iter_write+0xae/0xe0 [zfs]
Nov 07 20:29:19 rakete kernel:  aio_write+0x113/0x170
Nov 07 20:29:19 rakete kernel:  ? __check_object_size+0xaf/0x1b0
Nov 07 20:29:19 rakete kernel:  ? _copy_to_user+0x2a/0x40
Nov 07 20:29:19 rakete kernel:  ? aio_read_events+0x245/0x390
Nov 07 20:29:19 rakete kernel:  do_io_submit+0x387/0x730
Nov 07 20:29:19 rakete kernel:  ? do_io_submit+0x387/0x730
Nov 07 20:29:19 rakete kernel:  SyS_io_submit+0x10/0x20
Nov 07 20:29:19 rakete kernel:  ? SyS_io_submit+0x10/0x20
Nov 07 20:29:19 rakete kernel:  entry_SYSCALL_64_fastpath+0x1a/0xa5
Nov 07 20:29:19 rakete kernel: RIP: 0033:0x7fd2010f46c7
Nov 07 20:29:19 rakete kernel: RSP: 002b:00007ffc6e157498 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
Nov 07 20:29:19 rakete kernel: RAX: ffffffffffffffda RBX: 00007fd1e4855000 RCX: 00007fd2010f46c7
Nov 07 20:29:19 rakete kernel: RDX: 000055560c8417a0 RSI: 0000000000000001 RDI: 00007fd214bcd000
Nov 07 20:29:19 rakete kernel: RBP: 000055560c8415c0 R08: 0000000000000001 R09: 000055560c841660
Nov 07 20:29:19 rakete kernel: R10: 000000005e2e0000 R11: 0000000000000246 R12: 00007fd1fb939510
Nov 07 20:29:19 rakete kernel: R13: 0000000000000001 R14: 0000000000100000 R15: 0000000000000000
Nov 07 20:29:58 rakete gnome-terminal-[3506]: Allocating size to GtkScrollbar 0x563a70df4500 without calling gtk_widget_get_preferred_width/h
Nov 07 20:29:58 rakete gnome-terminal-[3506]: Allocating size to GtkScrollbar 0x563a70df4500 without calling gtk_widget_get_preferred_width/h
Nov 07 20:29:58 rakete gnome-terminal-[3506]: Allocating size to GtkScrollbar 0x563a70df4500 without calling gtk_widget_get_preferred_width/h
Nov 07 20:29:58 rakete gnome-terminal-[3506]: Allocating size to GtkScrollbar 0x563a70df4500 without calling gtk_widget_get_preferred_width/h
Nov 07 20:29:58 rakete gnome-terminal-[3506]: Allocating size to GtkScrollbar 0x563a70df4500 without calling gtk_widget_get_preferred_width/h
Nov 07 20:29:58 rakete gnome-terminal-[3506]: Allocating size to GtkScrollbar 0x563a70df4500 without calling gtk_widget_get_preferred_width/h
Nov 07 20:30:01 rakete crond[25044]: pam_unix(crond:session): session opened for user matthias by (uid=0)
Nov 07 20:30:01 rakete CROND[25045]: (matthias) CMD (getmail --rcfile kontent-rc --rcfile google-rc --rcfile nexgo-rc --rcfile telekom-rc --r
Nov 07 20:30:05 rakete CROND[25044]: pam_unix(crond:session): session closed for user matthias
Nov 07 20:31:22 rakete kernel: INFO: task z_wr_iss:1445 blocked for more than 120 seconds.
Nov 07 20:31:22 rakete kernel:       Tainted: P           O    4.13.11-1-MANJARO #1
Nov 07 20:31:22 rakete kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 07 20:31:22 rakete kernel: z_wr_iss        D    0  1445      2 0x00000000
Nov 07 20:31:22 rakete kernel: Call Trace:
Nov 07 20:31:22 rakete kernel:  __schedule+0x239/0x890
Nov 07 20:31:22 rakete kernel:  schedule+0x3d/0x90
Nov 07 20:31:22 rakete kernel:  io_schedule+0x16/0x40
Nov 07 20:31:22 rakete kernel:  wbt_wait+0x178/0x350
Nov 07 20:31:22 rakete kernel:  ? wait_woken+0x80/0x80
Nov 07 20:31:22 rakete kernel:  blk_queue_bio+0x101/0x410
Nov 07 20:31:22 rakete kernel:  generic_make_request+0x125/0x320
Nov 07 20:31:22 rakete kernel:  submit_bio+0x73/0x150
Nov 07 20:31:22 rakete kernel:  ? submit_bio+0x73/0x150
Nov 07 20:31:22 rakete kernel:  vdev_disk_io_start+0x4b2/0x6f0 [zfs]
Nov 07 20:31:22 rakete kernel:  zio_vdev_io_start+0xa2/0x340 [zfs]
Nov 07 20:31:22 rakete kernel:  ? zio_vdev_io_start+0xa2/0x340 [zfs]
Nov 07 20:31:22 rakete kernel:  zio_nowait+0xab/0x160 [zfs]
Nov 07 20:31:22 rakete kernel:  vdev_raidz_io_start+0x1de/0x300 [zfs]
Nov 07 20:31:22 rakete kernel:  zio_vdev_io_start+0xa2/0x340 [zfs]
Nov 07 20:31:22 rakete kernel:  zio_nowait+0xab/0x160 [zfs]
Nov 07 20:31:22 rakete kernel:  ? vdev_config_sync+0x180/0x180 [zfs]
Nov 07 20:31:22 rakete kernel:  vdev_mirror_io_start+0x94/0x180 [zfs]
Nov 07 20:31:22 rakete kernel:  zio_vdev_io_start+0x1aa/0x340 [zfs]
Nov 07 20:31:22 rakete kernel:  ? taskq_member+0x18/0x30 [spl]
Nov 07 20:31:22 rakete kernel:  zio_execute+0x8a/0xe0 [zfs]
Nov 07 20:31:22 rakete kernel:  taskq_thread+0x254/0x480 [spl]
Nov 07 20:31:22 rakete kernel:  ? wake_up_q+0x80/0x80
Nov 07 20:31:22 rakete kernel:  kthread+0x125/0x140
Nov 07 20:31:22 rakete kernel:  ? taskq_thread_should_stop+0x70/0x70 [spl]
Nov 07 20:31:22 rakete kernel:  ? kthread_create_on_node+0x70/0x70
Nov 07 20:31:22 rakete kernel:  ret_from_fork+0x25/0x30
Nov 07 20:31:22 rakete kernel: INFO: task txg_sync:1966 blocked for more than 120 seconds.
Nov 07 20:31:22 rakete kernel:       Tainted: P           O    4.13.11-1-MANJARO #1
Nov 07 20:31:22 rakete kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 07 20:31:22 rakete kernel: txg_sync        D    0  1966      2 0x00000000
Nov 07 20:31:22 rakete kernel: Call Trace:
Nov 07 20:31:22 rakete kernel:  __schedule+0x239/0x890
Nov 07 20:31:22 rakete kernel:  schedule+0x3d/0x90
Nov 07 20:31:22 rakete kernel:  io_schedule+0x16/0x40
Nov 07 20:31:22 rakete kernel:  cv_wait_common+0xb0/0x130 [spl]
Nov 07 20:31:22 rakete kernel:  ? wait_woken+0x80/0x80
Nov 07 20:31:22 rakete kernel:  __cv_wait_io+0x18/0x20 [spl]
Nov 07 20:31:22 rakete kernel:  zio_wait+0xf2/0x1b0 [zfs]
Nov 07 20:31:22 rakete kernel:  dsl_pool_sync+0xb8/0x430 [zfs]
Nov 07 20:31:22 rakete kernel:  spa_sync+0x43f/0xd80 [zfs]
Nov 07 20:31:22 rakete kernel:  txg_sync_thread+0x2d2/0x4a0 [zfs]
Nov 07 20:31:22 rakete kernel:  ? finish_task_switch+0x75/0x200
Nov 07 20:31:22 rakete kernel:  ? txg_delay+0x1b0/0x1b0 [zfs]
Nov 07 20:31:22 rakete kernel:  thread_generic_wrapper+0x72/0x80 [spl]
Nov 07 20:31:22 rakete kernel:  kthread+0x125/0x140
Nov 07 20:31:22 rakete kernel:  ? __thread_exit+0x20/0x20 [spl]
Nov 07 20:31:22 rakete kernel:  ? kthread_create_on_node+0x70/0x70
Nov 07 20:31:22 rakete kernel:  ? kthread_create_on_node+0x70/0x70
Nov 07 20:31:22 rakete kernel:  ret_from_fork+0x25/0x30
Nov 07 20:31:22 rakete kernel: INFO: task fio:24704 blocked for more than 120 seconds.
Nov 07 20:31:22 rakete kernel:       Tainted: P           O    4.13.11-1-MANJARO #1
Nov 07 20:31:22 rakete kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 07 20:31:22 rakete kernel: fio             D    0 24704  24684 0x00000000
Nov 07 20:31:22 rakete kernel: Call Trace:
Nov 07 20:31:22 rakete kernel:  __schedule+0x239/0x890
Nov 07 20:31:22 rakete kernel:  schedule+0x3d/0x90
Nov 07 20:31:22 rakete kernel:  cv_wait_common+0x11c/0x130 [spl]
Nov 07 20:31:22 rakete kernel:  ? wait_woken+0x80/0x80
Nov 07 20:31:22 rakete kernel:  __cv_wait+0x15/0x20 [spl]
Nov 07 20:31:22 rakete kernel:  txg_wait_open+0xb0/0x100 [zfs]
Nov 07 20:31:22 rakete kernel:  dmu_tx_wait+0x37c/0x390 [zfs]
Nov 07 20:31:22 rakete kernel:  dmu_tx_assign+0x8f/0x480 [zfs]
Nov 07 20:31:22 rakete kernel:  zfs_write+0x403/0xd30 [zfs]
Nov 07 20:31:22 rakete kernel:  ? zio_destroy+0xc3/0xd0 [zfs]
Nov 07 20:31:22 rakete kernel:  ? zio_wait+0x139/0x1b0 [zfs]
Nov 07 20:31:22 rakete kernel:  ? spl_kmem_free+0x33/0x40 [spl]
Nov 07 20:31:22 rakete kernel:  ? spl_kmem_free+0x33/0x40 [spl]
Nov 07 20:31:22 rakete kernel:  ? zfs_range_unlock+0x1ec/0x2c0 [zfs]
Nov 07 20:31:22 rakete kernel:  zpl_write_common_iovec+0x8c/0xe0 [zfs]
Nov 07 20:31:22 rakete kernel:  zpl_iter_write+0xae/0xe0 [zfs]
Nov 07 20:31:22 rakete kernel:  aio_write+0x113/0x170
Nov 07 20:31:22 rakete kernel:  ? __check_object_size+0xaf/0x1b0
Nov 07 20:31:22 rakete kernel:  ? _copy_to_user+0x2a/0x40
Nov 07 20:31:22 rakete kernel:  ? aio_read_events+0x245/0x390
Nov 07 20:31:22 rakete kernel:  do_io_submit+0x387/0x730
Nov 07 20:31:22 rakete kernel:  ? do_io_submit+0x387/0x730
Nov 07 20:31:22 rakete kernel:  SyS_io_submit+0x10/0x20
Nov 07 20:31:22 rakete kernel:  ? SyS_io_submit+0x10/0x20
Nov 07 20:31:22 rakete kernel:  entry_SYSCALL_64_fastpath+0x1a/0xa5
Nov 07 20:31:22 rakete kernel: RIP: 0033:0x7fd2010f46c7
Nov 07 20:31:22 rakete kernel: RSP: 002b:00007ffc6e157498 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
Nov 07 20:31:22 rakete kernel: RAX: ffffffffffffffda RBX: 00007fd1e4855000 RCX: 00007fd2010f46c7
Nov 07 20:31:22 rakete kernel: RDX: 000055560c8417a0 RSI: 0000000000000001 RDI: 00007fd214bcd000
Nov 07 20:31:22 rakete kernel: RBP: 000055560c8415c0 R08: 0000000000000001 R09: 000055560c841660
Nov 07 20:31:22 rakete kernel: R10: 000000005e2e0000 R11: 0000000000000246 R12: 00007fd1fb939510
Nov 07 20:31:22 rakete kernel: R13: 0000000000000001 R14: 0000000000100000 R15: 0000000000000000
Nov 07 20:33:25 rakete kernel: INFO: task z_wr_iss:1445 blocked for more than 120 seconds.
Nov 07 20:33:25 rakete kernel:       Tainted: P           O    4.13.11-1-MANJARO #1
Nov 07 20:33:25 rakete kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 07 20:33:25 rakete kernel: z_wr_iss        D    0  1445      2 0x00000000
Nov 07 20:33:25 rakete kernel: Call Trace:
Nov 07 20:33:25 rakete kernel:  __schedule+0x239/0x890
Nov 07 20:33:25 rakete kernel:  schedule+0x3d/0x90
Nov 07 20:33:25 rakete kernel:  io_schedule+0x16/0x40
Nov 07 20:33:25 rakete kernel:  wbt_wait+0x178/0x350
Nov 07 20:33:25 rakete kernel:  ? wait_woken+0x80/0x80
Nov 07 20:33:25 rakete kernel:  blk_queue_bio+0x101/0x410
Nov 07 20:33:25 rakete kernel:  generic_make_request+0x125/0x320
Nov 07 20:33:25 rakete kernel:  submit_bio+0x73/0x150
Nov 07 20:33:25 rakete kernel:  ? submit_bio+0x73/0x150
Nov 07 20:33:25 rakete kernel:  vdev_disk_io_start+0x4b2/0x6f0 [zfs]
Nov 07 20:33:25 rakete kernel:  zio_vdev_io_start+0xa2/0x340 [zfs]
Nov 07 20:33:25 rakete kernel:  ? zio_vdev_io_start+0xa2/0x340 [zfs]
Nov 07 20:33:25 rakete kernel:  zio_nowait+0xab/0x160 [zfs]
Nov 07 20:33:25 rakete kernel:  vdev_raidz_io_start+0x1de/0x300 [zfs]
Nov 07 20:33:25 rakete kernel:  zio_vdev_io_start+0xa2/0x340 [zfs]
Nov 07 20:33:25 rakete kernel:  zio_nowait+0xab/0x160 [zfs]
Nov 07 20:33:25 rakete kernel:  ? vdev_config_sync+0x180/0x180 [zfs]
Nov 07 20:33:25 rakete kernel:  vdev_mirror_io_start+0x94/0x180 [zfs]
Nov 07 20:33:25 rakete kernel:  zio_vdev_io_start+0x1aa/0x340 [zfs]
Nov 07 20:33:25 rakete kernel:  ? taskq_member+0x18/0x30 [spl]
Nov 07 20:33:25 rakete kernel:  zio_execute+0x8a/0xe0 [zfs]
Nov 07 20:33:25 rakete kernel:  taskq_thread+0x254/0x480 [spl]
Nov 07 20:33:25 rakete kernel:  ? wake_up_q+0x80/0x80
Nov 07 20:33:25 rakete kernel:  kthread+0x125/0x140
Nov 07 20:33:25 rakete kernel:  ? taskq_thread_should_stop+0x70/0x70 [spl]
Nov 07 20:33:25 rakete kernel:  ? kthread_create_on_node+0x70/0x70
Nov 07 20:33:25 rakete kernel:  ret_from_fork+0x25/0x30

This only happens with kernel 4.13.11 and zfs_prefetch_disable=0. This is reproducible when the script switches to scheduler bfq-sq. No issue with the other schedulers.

With kernel 4.9.60 it does not happen regardless of zfs_prefetch_disable. By the way, for kernel 4.9.60 I have to use bfq instead of bfq-sq.

sempervictus commented 6 years ago

@mabod: pretty sure i've seen this, or a very close-to-this stack trace a few times on BFQ, without the bfq-sq changes now upstream (stuck on 4.9). Have you tried #6568 by chance? Curious if it crops up there too

mabod commented 6 years ago

I did not test #6568. But I could test it if (1) somebody provides me with zfs/spl module binaries for Manjaro and (2) this somebody promises me that this will not break by pool and the data.

sempervictus commented 6 years ago

You should REALLY not be installing binaries into your kernel from random sources :-). One of those infosec "no-nos." Arch (or your version of it) has a very simple package builder you can utilize to quickly rebuild ZFS for your kernel and patch it on the fly in the build process. Pkgbuilds are well doc'd. Also, your ZFS data may already be corrupt for all you know, and nobody will guarantee you anything - there's no financial exchange going on here to imply any level of support or liability (see the old spacemap issues for an example of how corruption can happen without you knowing or the recent crypto fixes).

mabod commented 6 years ago

I understand that. I was not 100% serious with my comment. I do not feel comfortable compiling zfs+patch by myself and test it. I strongly hope that this isssue can be debugged and solved by other means.

mabod commented 4 years ago

This issue is obsolete. Closing it.