Closed freakout42 closed 5 years ago
I hit something similar but maybe different last night. I believe it occurred while KVM was copying disk blocks from another server to this one. This ended up causing actual corruption on at least one of the zvols (as seen by the VM).
[381360.908047] INFO: task zvol/0:331 blocked for more than 120 seconds.
[381360.908147] Tainted: PF O 3.13.0-32-generic #57-Ubuntu
[381360.908240] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[381360.908352] zvol/0 D ffff88062fc54440 0 331 2 0x00000000
[381360.908357] ffff88060796dbe0 0000000000000002 ffff880609aa17f0 ffff88060796dfd8
[381360.908360] 0000000000014440 0000000000014440 ffff880609aa17f0 ffff880609a5eb30
[381360.908362] ffff880609a5e9f8 ffff880609a5eb38 0000000000000000 0000000000000002
[381360.908365] Call Trace:
[381360.908374] [<ffffffff817200d9>] schedule+0x29/0x70
[381360.908396] [<ffffffffa00c0485>] cv_wait_common+0x105/0x1a0 [spl]
[381360.908402] [<ffffffff810aaf00>] ? prepare_to_wait_event+0x100/0x100
[381360.908408] [<ffffffffa00c0535>] __cv_wait+0x15/0x20 [spl]
[381360.908459] [<ffffffffa0209abb>] txg_wait_open+0x8b/0x110 [zfs]
[381360.908476] [<ffffffffa01cd83b>] dmu_tx_wait+0x29b/0x2a0 [zfs]
[381360.908492] [<ffffffffa01cd8cc>] dmu_tx_assign+0x8c/0x460 [zfs]
[381360.908520] [<ffffffffa025a8c7>] zvol_write+0xa7/0x480 [zfs]
[381360.908527] [<ffffffffa00bab27>] taskq_thread+0x237/0x4b0 [spl]
[381360.908530] [<ffffffff81097508>] ? finish_task_switch+0x128/0x170
[381360.908534] [<ffffffff8109a800>] ? wake_up_state+0x20/0x20
[381360.908539] [<ffffffffa00ba8f0>] ? taskq_cancel_id+0x1f0/0x1f0 [spl]
[381360.908543] [<ffffffff8108b3d2>] kthread+0xd2/0xf0
[381360.908545] [<ffffffff8108b300>] ? kthread_create_on_node+0x1d0/0x1d0
[381360.908548] [<ffffffff8172c5bc>] ret_from_fork+0x7c/0xb0
[381360.908550] [<ffffffff8108b300>] ? kthread_create_on_node+0x1d0/0x1d0
This was repeated for zvol/0
through zvol/9
. There are 31 zvols on this system.
[ 4.995804] SPL: Loaded module v0.6.3-1~precise
[ 5.130074] ZFS: Loaded module v0.6.3-2~precise, ZFS pool version 5000, ZFS filesystem version 5
I tried migrating a VM again today and all hell broke loose but I did not get these errors. The system load was in the 100's on an 8 core system. Major I/O wait time. I killed the migration but ended up having at least 3 corrupt zvols anyways.
i do not have any corruption or problem - the system is stable and running - also the rsync tasks are done 100% perfect - but very slow 1M/second write performance - and while running the whole system is not responding fast - but no errors and no corruption - these "blocked for more than 120 seconds" dmesg's come in pairs for rsync and txg_sync once a day
not similar but related message:
Aug 29 05:37:06 morpheus kernel: [46185.239554] ata6.00: configured for UDMA/133
Aug 29 05:37:06 morpheus kernel: [46185.239562] ata6: EH complete
Aug 29 05:53:40 morpheus kernel: [47179.890587] INFO: task txg_sync:1462 blocked for more than 180 seconds.
Aug 29 05:53:40 morpheus kernel: [47179.890589] Tainted: P O 3.16.0_ck1-smtnice6_BFQ_integra_intel #1
Aug 29 05:53:40 morpheus kernel: [47179.890590] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 29 05:53:40 morpheus kernel: [47179.890591] txg_sync D 0000000000000006 0 1462 2 0x00000000
Aug 29 05:53:40 morpheus kernel: [47179.890594] ffff8805fbda9e40 0000000000000046 ffff88020931a9b0 ffff88065b610000
Aug 29 05:53:40 morpheus kernel: [47179.890596] 0000000000000066 ffff8807fb2a0f20 000000000000b020 0000000000013100
Aug 29 05:53:40 morpheus kernel: [47179.890597] ffff88065b6103a0 00002992b81d47dd ffff880617373fd8 ffff88065b610000
Aug 29 05:53:40 morpheus kernel: [47179.890599] Call Trace:
Aug 29 05:53:40 morpheus kernel: [47179.890605] [
this seems to occur from time to time with a rather slow USB3.0 powered 4TB hdd (Touro Desk 3.0, HGST5K4000) in an external case during rsync & transferring of large files (several GiB)
clarification for above:
I had this happen with the above mentioned drive - of which I still don't know what causes it (but I suspect it could be related either to the chipset in the external harddrive enclosure where the drive sits in or powersaving features of the XHCI driver & hardware, which I already had issues with in the past)
another drive showed this behavior (a seagate ST3000DM001) which likely underwent a headcrash and did reallocate several sectors (<10). It had been placed in an external enclosure by fantec [db-f8u3e with an incompatible chipset against smartctl] that had shown in the past to have a life & mind of its own: it would occasionally turn off during transfers and causing trouble with other filesystems, on ZFS, however, the files so far seemed fine. The day before yesterday I placed the drive in another external enclosure and it worked well during backups (only transferring several hundreds of MiB of data per backup job incrementally via rsync) until I decided to run a scrub and check everything: after several hours the drive again screamed and made hearable noises of a head-crash and/or sector re-allocation (had those in the past) and access wasn't possible anymore to the drive
that's where the above posted message occured again so at least in the second case (ST3000DM001) ZFS seemingly showed indirectly that something was wrong with the hardware/harddrive
so when encountering this message - make sure to double- or triple-check that it's not a hardware-issue instead a "software"- (ZFS-related) problem
@wankdanker I think that your issue is separate. It might have been caused by the zvol processing occuring inside an interrupt context. Pull request #2484 might resolve it.
@freakout42 Would you tell us more about your pool configuration? Also, do you have data deduplication enabled on this pool?
[root@blood ~]# zpool status
pool: tank
state: ONLINE
scan: scrub repaired 0 in 0h53m with 0 errors on Tue Sep 9 12:45:13 2014
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sda4 ONLINE 0 0 0
sdb4 ONLINE 0 0 0
errors: No known data errors
[root@blood ~]# zpool get all
NAME PROPERTY VALUE SOURCE
tank size 824G -
tank capacity 34% -
tank altroot - default
tank health ONLINE -
tank guid 3198719639486948540 default
tank version - default
tank bootfs - default
tank delegation on default
tank autoreplace off default
tank cachefile - default
tank failmode wait default
tank listsnapshots off default
tank autoexpand off default
tank dedupditto 0 default
tank dedupratio 1.00x -
tank free 538G -
tank allocated 286G -
tank readonly off -
tank ashift 0 default
tank comment - default
tank expandsize 0 -
tank freeing 0 default
tank feature@async_destroy enabled local
tank feature@empty_bpobj active local
tank feature@lz4_compress enabled local
Had similar failures, which occurred during heavy rsync pulls from remote machine.
I was able to make it happen very quickly by starting up the remote pull. Did this three times in a row and caused the fault every time. The symptom was that any userland zfs/zpool commands hang, but the machine was still responsive to other commands.
I set the parameter spl_kmem_cache_slab_limit=0 (it had been spl_kmem_cache_slab_limit=16384), and the problem seems to be gone, or not easliy triggered.
Part of the process which triggers this includes snapshot renaming, but no zvols are involved in this process, although the pool has some.
The pool is a raidz1 pool, and there are no hardware issues on the server.
Sep 23 16:17:51 zephyr kernel: [ 27.497382] SPL: Loaded module v0.6.3-1~precise
Sep 23 16:17:51 zephyr kernel: [ 27.515752] ZFS: Loaded module v0.6.3-2~precise, ZFS pool version 5000, ZFS filesystem version 5
Sep 26 00:11:23 zephyr kernel: [200583.071397] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 26 00:11:23 zephyr kernel: [200583.093284] txg_sync D ffffffff8180cbe0 0 4178 2 0x00000000
Sep 26 00:11:23 zephyr kernel: [200583.093291] ffff8807d9fd1aa0 0000000000000046 00000000000000ff ffffffffa026c280
Sep 26 00:11:23 zephyr kernel: [200583.093297] ffff8807d9fd1fd8 ffff8807d9fd1fd8 ffff8807d9fd1fd8 00000000000139c0
Sep 26 00:11:23 zephyr kernel: [200583.093303] ffff8807f3d6ae20 ffff8807e0189710 00000000ffffffff ffffffffa026c280
Sep 26 00:11:23 zephyr kernel: [200583.093309] Call Trace:
Sep 26 00:11:23 zephyr kernel: [200583.093333] [<ffffffff8169f099>] schedule+0x29/0x70
Sep 26 00:11:23 zephyr kernel: [200583.093338] [<ffffffff8169f35e>] schedule_preempt_disabled+0xe/0x10
Sep 26 00:11:23 zephyr kernel: [200583.093343] [<ffffffff8169df77>] __mutex_lock_slowpath+0xd7/0x150
Sep 26 00:11:23 zephyr kernel: [200583.093350] [<ffffffff8169db8a>] mutex_lock+0x2a/0x50
Sep 26 00:11:23 zephyr kernel: [200583.093395] [<ffffffffa0230f19>] zvol_rename_minors+0x79/0x180 [zfs]
Sep 26 00:11:23 zephyr kernel: [200583.093432] [<ffffffffa01ad349>] dsl_dataset_rename_snapshot_sync_impl+0x189/0x2c0 [zfs]
Sep 26 00:11:23 zephyr kernel: [200583.093460] [<ffffffffa01ad52f>] dsl_dataset_rename_snapshot_sync+0xaf/0x190 [zfs]
Sep 26 00:11:23 zephyr kernel: [200583.093491] [<ffffffffa01bde52>] dsl_sync_task_sync+0xf2/0x100 [zfs]
Sep 26 00:11:23 zephyr kernel: [200583.093521] [<ffffffffa01b63b3>] dsl_pool_sync+0x2f3/0x420 [zfs]
Sep 26 00:11:23 zephyr kernel: [200583.093554] [<ffffffffa01cc4f4>] spa_sync+0x414/0xb20 [zfs]
Sep 26 00:11:23 zephyr kernel: [200583.093590] [<ffffffffa01db021>] ? spa_txg_history_set+0x21/0xd0 [zfs]
Sep 26 00:11:23 zephyr kernel: [200583.093625] [<ffffffffa01de335>] txg_sync_thread+0x385/0x5d0 [zfs]
Sep 26 00:11:23 zephyr kernel: [200583.093659] [<ffffffffa01ddfb0>] ? txg_init+0x260/0x260 [zfs]
Sep 26 00:11:23 zephyr kernel: [200583.093672] [<ffffffffa00d3fc8>] thread_generic_wrapper+0x78/0x90 [spl]
Sep 26 00:11:23 zephyr kernel: [200583.093682] [<ffffffffa00d3f50>] ? __thread_create+0x300/0x300 [spl]
Sep 26 00:11:23 zephyr kernel: [200583.093688] [<ffffffff81077ee3>] kthread+0x93/0xa0
Sep 26 00:11:23 zephyr kernel: [200583.093694] [<ffffffff816a9b24>] kernel_thread_helper+0x4/0x10
Sep 26 00:11:23 zephyr kernel: [200583.093700] [<ffffffff81077e50>] ? flush_kthread_worker+0xb0/0xb0
Sep 26 00:11:23 zephyr kernel: [200583.093704] [<ffffffff816a9b20>] ? gs_change+0x13/0x13
Sep 26 00:11:23 zephyr kernel: [200583.093737] INFO: task zfs:22581 blocked for more than 120 seconds.
Sep 26 00:11:23 zephyr kernel: [200583.104867] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 26 00:11:23 zephyr kernel: [200583.126712] zfs D ffffffff8180cbe0 0 22581 22253 0x00000000
Sep 26 00:11:23 zephyr kernel: [200583.126717] ffff880182f2f958 0000000000000082 ffff8807eee9c530 ffff8807deda2550
Sep 26 00:11:23 zephyr kernel: [200583.126722] ffff880182f2ffd8 ffff880182f2ffd8 ffff880182f2ffd8 00000000000139c0
Sep 26 00:11:23 zephyr kernel: [200583.126728] ffff8807f3f04530 ffff8807eee9c530 0000000000000292 ffff8807deda2550
Sep 26 00:11:23 zephyr kernel: [200583.126733] Call Trace:
Sep 26 00:11:23 zephyr kernel: [200583.126740] [<ffffffff8169f099>] schedule+0x29/0x70
Sep 26 00:11:23 zephyr kernel: [200583.126753] [<ffffffffa00dbd8d>] cv_wait_common+0xfd/0x1b0 [spl]
Sep 26 00:11:23 zephyr kernel: [200583.126768] [<ffffffff810787c0>] ? add_wait_queue+0x60/0x60
Sep 26 00:11:23 zephyr kernel: [200583.126779] [<ffffffffa00dbe95>] __cv_wait+0x15/0x20 [spl]
Sep 26 00:11:23 zephyr kernel: [200583.126813] [<ffffffffa01c4aec>] rrw_enter_read+0x3c/0x130 [zfs]
Sep 26 00:11:23 zephyr kernel: [200583.126849] [<ffffffffa01c4c70>] rrw_enter+0x20/0x30 [zfs]
Sep 26 00:11:23 zephyr kernel: [200583.126878] [<ffffffffa01b51cd>] dsl_pool_config_enter+0x1d/0x20 [zfs]
Sep 26 00:11:23 zephyr kernel: [200583.126915] [<ffffffffa01b709a>] dsl_pool_hold+0x4a/0x60 [zfs]
Sep 26 00:11:23 zephyr kernel: [200583.126927] [<ffffffffa019957b>] dmu_objset_hold+0x2b/0xb0 [zfs]
Sep 26 00:11:23 zephyr kernel: [200583.126929] [<ffffffff816a85d6>] ? ftrace_call+0x5/0x2b
Sep 26 00:11:23 zephyr kernel: [200583.126942] [<ffffffffa01b8a6f>] dsl_prop_get+0x3f/0x90 [zfs]
Sep 26 00:11:23 zephyr kernel: [200583.126944] [<ffffffff816a85d6>] ? ftrace_call+0x5/0x2b
Sep 26 00:11:23 zephyr kernel: [200583.126956] [<ffffffffa01b8ade>] dsl_prop_get_integer+0x1e/0x20 [zfs]
Sep 26 00:11:23 zephyr kernel: [200583.126970] [<ffffffffa022ef6f>] __zvol_create_minor+0xbf/0x630 [zfs]
Sep 26 00:11:23 zephyr kernel: [200583.126985] [<ffffffffa0230bc7>] zvol_create_minor+0x27/0x40 [zfs]
Sep 26 00:11:23 zephyr kernel: [200583.126998] [<ffffffffa0230bee>] zvol_create_minors_cb+0xe/0x20 [zfs]
Sep 26 00:11:23 zephyr kernel: [200583.127010] [<ffffffffa0197e6e>] dmu_objset_find_impl+0x37e/0x3f0 [zfs]
Sep 26 00:11:23 zephyr kernel: [200583.127023] [<ffffffffa0230be0>] ? zvol_create_minor+0x40/0x40 [zfs]
Sep 26 00:11:23 zephyr kernel: [200583.127034] [<ffffffffa0197cae>] dmu_objset_find_impl+0x1be/0x3f0 [zfs]
Sep 26 00:11:23 zephyr kernel: [200583.127048] [<ffffffffa0230be0>] ? zvol_create_minor+0x40/0x40 [zfs]
Sep 26 00:11:23 zephyr kernel: [200583.127061] [<ffffffffa0230be0>] ? zvol_create_minor+0x40/0x40 [zfs]
Sep 26 00:11:23 zephyr kernel: [200583.127072] [<ffffffffa0197f32>] dmu_objset_find+0x52/0x80 [zfs]
Sep 26 00:11:23 zephyr kernel: [200583.127074] [<ffffffff816a85d6>] ? ftrace_call+0x5/0x2b
Sep 26 00:11:23 zephyr kernel: [200583.127088] [<ffffffffa0230d53>] zvol_create_minors+0x33/0x40 [zfs]
Sep 26 00:11:23 zephyr kernel: [200583.127102] [<ffffffffa0202a69>] zfs_ioc_snapshot+0x259/0x2a0 [zfs]
Sep 26 00:11:23 zephyr kernel: [200583.127116] [<ffffffffa0206530>] zfsdev_ioctl+0x180/0x500 [zfs]
Sep 26 00:11:23 zephyr kernel: [200583.127118] [<ffffffff816a85d6>] ? ftrace_call+0x5/0x2b
Sep 26 00:11:23 zephyr kernel: [200583.127131] [<ffffffff8119ae9a>] do_vfs_ioctl+0x8a/0x340
Sep 26 00:11:23 zephyr kernel: [200583.127135] [<ffffffff8119b1e1>] sys_ioctl+0x91/0xa0
Sep 26 00:11:23 zephyr kernel: [200583.127141] [<ffffffff816a8829>] system_call_fastpath+0x16/0x1b
@ColdCanuck Your comments regarding spl_kmem_cache_slab_limit
piqued my interest in this issue which until now, I've not had time to follow. In fact, a quick skim of this whole issue makes me wonder how related each of the problem reports are and whether there's too much issue creep.
Back to the point at hand: I'm posting this followup because there have been a disturbing number of seemingly otherwise unrelated problems sporadically seemingly caused by using the Linux slab. Although I've not been able to spend the time on it I've wanted to, I've been rather knee deep investigating the series of issues related to Posix ACLs and SA xattrs and have seen at least one report (#2701) and, more interestingly #2725 which makes me think there may be a tie-in to our use of the Linux slab for <= 16KiB objects. I don't have any other brilliant observations offer at the moment other than to raise concern there may be problems realted to using the Linux slab and to ask @behlendorf, @ryao et al. what your thoughts are on this (particularly given the last few comments in #2725).
just posting what comes to mind:
could scheduling a regular cronjob which compacts memory via echo 1 > /proc/sys/vm/compact_memory
change things (provided slab issues and timeouts are related to memory fragmentation)
I'm systematically having this issue when trying to RSync when using the latest ZFS from Arch: zfs-git 0.6.3_r170_gd958324f_3.18.2_2-1
[46181.967521] perf interrupt took too long (2506 > 2495), lowering kernel.perf_event_max_sample_rate to 50100
[79468.027144] INFO: task txg_sync:583 blocked for more than 120 seconds.
[79468.027287] Tainted: P O 3.18.2-2-ARCH #1
[79468.027363] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[79468.027468] txg_sync D 0000000000000000 0 583 2 0x00000000
[79468.027476] ffff88021ce03b18 0000000000000046 ffff88021cd5eeb0 0000000000013640
[79468.027483] ffff88021ce03fd8 0000000000013640 ffff88021f601420 ffff88021cd5eeb0
[79468.027489] ffff8801591955b8 ffffffff00000000 ffff880159195570 00000000025e92cf
[79468.027494] Call Trace:
[79468.027509] [<ffffffff810b2978>] ? __wake_up+0x48/0x60
[79468.027518] [<ffffffff81550b59>] schedule+0x29/0x70
[79468.027524] [<ffffffff81550e38>] io_schedule+0x98/0x100
[79468.027547] [<ffffffffa01cbb45>] __cv_broadcast+0xe5/0x160 [spl]
[79468.027554] [<ffffffff810b2e30>] ? __wake_up_sync+0x20/0x20
[79468.027565] [<ffffffffa01cbc18>] __cv_wait_io+0x18/0x180 [spl]
[79468.027588] [<ffffffffa030367b>] zio_wait+0x11b/0x200 [zfs]
[79468.027616] [<ffffffffa028ac71>] dsl_pool_sync+0xc1/0x480 [zfs]
[79468.027625] [<ffffffffa01c60af>] ? spl_kmem_cache_free+0x10f/0x4a0 [spl]
[79468.027656] [<ffffffffa02a5b60>] spa_sync+0x480/0xbd0 [zfs]
[79468.027663] [<ffffffff810b2e46>] ? autoremove_wake_function+0x16/0x40
[79468.027694] [<ffffffffa02b730c>] txg_delay+0x4ec/0xa60 [zfs]
[79468.027724] [<ffffffffa02b6f90>] ? txg_delay+0x170/0xa60 [zfs]
[79468.027734] [<ffffffffa01c707a>] __thread_exit+0x9a/0xb0 [spl]
[79468.027742] [<ffffffffa01c7000>] ? __thread_exit+0x20/0xb0 [spl]
[79468.027749] [<ffffffff81090e0a>] kthread+0xea/0x100
[79468.027755] [<ffffffff81090d20>] ? kthread_create_on_node+0x1c0/0x1c0
[79468.027761] [<ffffffff81554bfc>] ret_from_fork+0x7c/0xb0
[79468.027767] [<ffffffff81090d20>] ? kthread_create_on_node+0x1c0/0x1c0
[82589.120097] INFO: task txg_sync:583 blocked for more than 120 seconds.
[82589.120252] Tainted: P O 3.18.2-2-ARCH #1
[82589.120347] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[82589.120479] txg_sync D 0000000000000001 0 583 2 0x00000000
[82589.120489] ffff88021ce03b18 0000000000000046 ffff88021cd5eeb0 0000000000013640
[82589.120497] ffff88021ce03fd8 0000000000013640 ffff8802240ceeb0 ffff88021cd5eeb0
[82589.120505] ffff88022363f7b0 ffff88022363f798 0000000000000000 0000000000000003
[82589.120512] Call Trace:
[82589.120529] [<ffffffff8109e8e2>] ? default_wake_function+0x12/0x20
[82589.120540] [<ffffffff810b2715>] ? __wake_up_common+0x55/0x90
[82589.120549] [<ffffffff810b2978>] ? __wake_up+0x48/0x60
[82589.120558] [<ffffffff81550b59>] schedule+0x29/0x70
[82589.120564] [<ffffffff81550e38>] io_schedule+0x98/0x100
[82589.120591] [<ffffffffa01cbb45>] __cv_broadcast+0xe5/0x160 [spl]
[82589.120599] [<ffffffff810b2e30>] ? __wake_up_sync+0x20/0x20
[82589.120613] [<ffffffffa01cbc18>] __cv_wait_io+0x18/0x180 [spl]
[82589.120641] [<ffffffffa030367b>] zio_wait+0x11b/0x200 [zfs]
[82589.120676] [<ffffffffa028ac71>] dsl_pool_sync+0xc1/0x480 [zfs]
[82589.120689] [<ffffffffa01c60af>] ? spl_kmem_cache_free+0x10f/0x4a0 [spl]
[82589.120727] [<ffffffffa02a5b60>] spa_sync+0x480/0xbd0 [zfs]
[82589.120735] [<ffffffff810b2e46>] ? autoremove_wake_function+0x16/0x40
[82589.120774] [<ffffffffa02b730c>] txg_delay+0x4ec/0xa60 [zfs]
[82589.120811] [<ffffffffa02b6f90>] ? txg_delay+0x170/0xa60 [zfs]
[82589.120823] [<ffffffffa01c707a>] __thread_exit+0x9a/0xb0 [spl]
[82589.120834] [<ffffffffa01c7000>] ? __thread_exit+0x20/0xb0 [spl]
[82589.120842] [<ffffffff81090e0a>] kthread+0xea/0x100
[82589.120849] [<ffffffff81090d20>] ? kthread_create_on_node+0x1c0/0x1c0
[82589.120858] [<ffffffff81554bfc>] ret_from_fork+0x7c/0xb0
[82589.120864] [<ffffffff81090d20>] ? kthread_create_on_node+0x1c0/0x1c0
[89311.468460] INFO: task txg_sync:583 blocked for more than 120 seconds.
[89311.468644] Tainted: P O 3.18.2-2-ARCH #1
[89311.468741] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[89311.468872] txg_sync D 0000000000000000 0 583 2 0x00000000
[89311.468882] ffff88021ce03b18 0000000000000046 ffff88021cd5eeb0 0000000000013640
[89311.468890] ffff88021ce03fd8 0000000000013640 ffff88021f601420 ffff88021cd5eeb0
[89311.468897] ffff88001afc4498 fffffffe00000000 ffff88001afc4490 00000000025e92cf
[89311.468905] Call Trace:
[89311.468922] [<ffffffff810b2978>] ? __wake_up+0x48/0x60
[89311.468933] [<ffffffff81550b59>] schedule+0x29/0x70
[89311.468940] [<ffffffff81550e38>] io_schedule+0x98/0x100
[89311.468968] [<ffffffffa01cbb45>] __cv_broadcast+0xe5/0x160 [spl]
[89311.468976] [<ffffffff810b2e30>] ? __wake_up_sync+0x20/0x20
[89311.468989] [<ffffffffa01cbc18>] __cv_wait_io+0x18/0x180 [spl]
[89311.469017] [<ffffffffa030367b>] zio_wait+0x11b/0x200 [zfs]
[89311.469052] [<ffffffffa028ac71>] dsl_pool_sync+0xc1/0x480 [zfs]
[89311.469065] [<ffffffffa01c60af>] ? spl_kmem_cache_free+0x10f/0x4a0 [spl]
[89311.469103] [<ffffffffa02a5b60>] spa_sync+0x480/0xbd0 [zfs]
[89311.469112] [<ffffffff810b2e46>] ? autoremove_wake_function+0x16/0x40
[89311.469150] [<ffffffffa02b730c>] txg_delay+0x4ec/0xa60 [zfs]
[89311.469187] [<ffffffffa02b6f90>] ? txg_delay+0x170/0xa60 [zfs]
[89311.469199] [<ffffffffa01c707a>] __thread_exit+0x9a/0xb0 [spl]
[89311.469211] [<ffffffffa01c7000>] ? __thread_exit+0x20/0xb0 [spl]
[89311.469218] [<ffffffff81090e0a>] kthread+0xea/0x100
[89311.469226] [<ffffffff81090d20>] ? kthread_create_on_node+0x1c0/0x1c0
[89311.469234] [<ffffffff81554bfc>] ret_from_fork+0x7c/0xb0
[89311.469241] [<ffffffff81090d20>] ? kthread_create_on_node+0x1c0/0x1c0
hinoki% free -h
total used free shared buff/cache available
Mem: 7.8G 4.3G 533M 1.1M 3.0G 702M
Swap: 4.0G 0B 4.0G
Still trying to run rsync:
[89431.510429] INFO: task txg_sync:583 blocked for more than 120 seconds.
[89431.510571] Tainted: P O 3.18.2-2-ARCH #1
[89431.510647] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[89431.510753] txg_sync D 0000000000000000 0 583 2 0x00000000
[89431.510762] ffff88021ce03b18 0000000000000046 ffff88021cd5eeb0 0000000000013640
[89431.510769] ffff88021ce03fd8 0000000000013640 ffff88021f601420 ffff88021cd5eeb0
[89431.510774] ffff88001afc4498 fffffffe00000000 ffff88001afc4490 00000000025e92cf
[89431.510780] Call Trace:
[89431.510796] [<ffffffff810b2978>] ? __wake_up+0x48/0x60
[89431.510805] [<ffffffff81550b59>] schedule+0x29/0x70
[89431.510811] [<ffffffff81550e38>] io_schedule+0x98/0x100
[89431.510836] [<ffffffffa01cbb45>] __cv_broadcast+0xe5/0x160 [spl]
[89431.510842] [<ffffffff810b2e30>] ? __wake_up_sync+0x20/0x20
[89431.510853] [<ffffffffa01cbc18>] __cv_wait_io+0x18/0x180 [spl]
[89431.510877] [<ffffffffa030367b>] zio_wait+0x11b/0x200 [zfs]
[89431.510906] [<ffffffffa028ac71>] dsl_pool_sync+0xc1/0x480 [zfs]
[89431.510915] [<ffffffffa01c60af>] ? spl_kmem_cache_free+0x10f/0x4a0 [spl]
[89431.510946] [<ffffffffa02a5b60>] spa_sync+0x480/0xbd0 [zfs]
[89431.510953] [<ffffffff810b2e46>] ? autoremove_wake_function+0x16/0x40
[89431.510984] [<ffffffffa02b730c>] txg_delay+0x4ec/0xa60 [zfs]
[89431.511014] [<ffffffffa02b6f90>] ? txg_delay+0x170/0xa60 [zfs]
[89431.511024] [<ffffffffa01c707a>] __thread_exit+0x9a/0xb0 [spl]
[89431.511033] [<ffffffffa01c7000>] ? __thread_exit+0x20/0xb0 [spl]
[89431.511039] [<ffffffff81090e0a>] kthread+0xea/0x100
[89431.511045] [<ffffffff81090d20>] ? kthread_create_on_node+0x1c0/0x1c0
[89431.511052] [<ffffffff81554bfc>] ret_from_fork+0x7c/0xb0
[89431.511058] [<ffffffff81090d20>] ? kthread_create_on_node+0x1c0/0x1c0
[89551.552404] INFO: task txg_sync:583 blocked for more than 120 seconds.
[89551.552565] Tainted: P O 3.18.2-2-ARCH #1
[89551.552660] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[89551.552792] txg_sync D 0000000000000000 0 583 2 0x00000000
[89551.552803] ffff88021ce03b18 0000000000000046 ffff88021cd5eeb0 0000000000013640
[89551.552812] ffff88021ce03fd8 0000000000013640 ffff88021f601420 ffff88021cd5eeb0
[89551.552819] ffff88001afc4498 fffffffe00000000 ffff88001afc4490 00000000025e92cf
[89551.552826] Call Trace:
[89551.552844] [<ffffffff810b2978>] ? __wake_up+0x48/0x60
[89551.552854] [<ffffffff81550b59>] schedule+0x29/0x70
[89551.552862] [<ffffffff81550e38>] io_schedule+0x98/0x100
[89551.552890] [<ffffffffa01cbb45>] __cv_broadcast+0xe5/0x160 [spl]
[89551.552898] [<ffffffff810b2e30>] ? __wake_up_sync+0x20/0x20
[89551.552911] [<ffffffffa01cbc18>] __cv_wait_io+0x18/0x180 [spl]
[89551.552940] [<ffffffffa030367b>] zio_wait+0x11b/0x200 [zfs]
[89551.552975] [<ffffffffa028ac71>] dsl_pool_sync+0xc1/0x480 [zfs]
[89551.552987] [<ffffffffa01c60af>] ? spl_kmem_cache_free+0x10f/0x4a0 [spl]
[89551.553025] [<ffffffffa02a5b60>] spa_sync+0x480/0xbd0 [zfs]
[89551.553034] [<ffffffff810b2e46>] ? autoremove_wake_function+0x16/0x40
[89551.553073] [<ffffffffa02b730c>] txg_delay+0x4ec/0xa60 [zfs]
[89551.553110] [<ffffffffa02b6f90>] ? txg_delay+0x170/0xa60 [zfs]
[89551.553122] [<ffffffffa01c707a>] __thread_exit+0x9a/0xb0 [spl]
[89551.553133] [<ffffffffa01c7000>] ? __thread_exit+0x20/0xb0 [spl]
[89551.553140] [<ffffffff81090e0a>] kthread+0xea/0x100
[89551.553148] [<ffffffff81090d20>] ? kthread_create_on_node+0x1c0/0x1c0
[89551.553156] [<ffffffff81554bfc>] ret_from_fork+0x7c/0xb0
[89551.553163] [<ffffffff81090d20>] ? kthread_create_on_node+0x1c0/0x1c0
[89671.594371] INFO: task txg_sync:583 blocked for more than 120 seconds.
[89671.594507] Tainted: P O 3.18.2-2-ARCH #1
[89671.594605] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[89671.594711] txg_sync D 0000000000000000 0 583 2 0x00000000
[89671.594720] ffff88021ce03b18 0000000000000046 ffff88021cd5eeb0 0000000000013640
[89671.594727] ffff88021ce03fd8 0000000000013640 ffff88021f601420 ffff88021cd5eeb0
[89671.594733] ffff88001afc4498 fffffffe00000000 ffff88001afc4490 00000000025e92cf
[89671.594738] Call Trace:
[89671.594753] [<ffffffff810b2978>] ? __wake_up+0x48/0x60
[89671.594762] [<ffffffff81550b59>] schedule+0x29/0x70
[89671.594768] [<ffffffff81550e38>] io_schedule+0x98/0x100
[89671.594792] [<ffffffffa01cbb45>] __cv_broadcast+0xe5/0x160 [spl]
[89671.594798] [<ffffffff810b2e30>] ? __wake_up_sync+0x20/0x20
[89671.594809] [<ffffffffa01cbc18>] __cv_wait_io+0x18/0x180 [spl]
[89671.594832] [<ffffffffa030367b>] zio_wait+0x11b/0x200 [zfs]
[89671.594860] [<ffffffffa028ac71>] dsl_pool_sync+0xc1/0x480 [zfs]
[89671.594870] [<ffffffffa01c60af>] ? spl_kmem_cache_free+0x10f/0x4a0 [spl]
[89671.594901] [<ffffffffa02a5b60>] spa_sync+0x480/0xbd0 [zfs]
[89671.594908] [<ffffffff810b2e46>] ? autoremove_wake_function+0x16/0x40
[89671.594939] [<ffffffffa02b730c>] txg_delay+0x4ec/0xa60 [zfs]
[89671.594969] [<ffffffffa02b6f90>] ? txg_delay+0x170/0xa60 [zfs]
[89671.594979] [<ffffffffa01c707a>] __thread_exit+0x9a/0xb0 [spl]
[89671.594988] [<ffffffffa01c7000>] ? __thread_exit+0x20/0xb0 [spl]
[89671.594994] [<ffffffff81090e0a>] kthread+0xea/0x100
[89671.595000] [<ffffffff81090d20>] ? kthread_create_on_node+0x1c0/0x1c0
[89671.595007] [<ffffffff81554bfc>] ret_from_fork+0x7c/0xb0
[89671.595013] [<ffffffff81090d20>] ? kthread_create_on_node+0x1c0/0x1c0
[89791.636364] INFO: task txg_sync:583 blocked for more than 120 seconds.
[89791.636507] Tainted: P O 3.18.2-2-ARCH #1
[89791.636583] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[89791.636688] txg_sync D 0000000000000000 0 583 2 0x00000000
[89791.636697] ffff88021ce03b18 0000000000000046 ffff88021cd5eeb0 0000000000013640
[89791.636703] ffff88021ce03fd8 0000000000013640 ffff88021f601420 ffff88021cd5eeb0
[89791.636709] ffff88001afc4498 fffffffe00000000 ffff88001afc4490 00000000025e92cf
[89791.636715] Call Trace:
[89791.636729] [<ffffffff810b2978>] ? __wake_up+0x48/0x60
[89791.636738] [<ffffffff81550b59>] schedule+0x29/0x70
[89791.636744] [<ffffffff81550e38>] io_schedule+0x98/0x100
[89791.636767] [<ffffffffa01cbb45>] __cv_broadcast+0xe5/0x160 [spl]
[89791.636774] [<ffffffff810b2e30>] ? __wake_up_sync+0x20/0x20
[89791.636785] [<ffffffffa01cbc18>] __cv_wait_io+0x18/0x180 [spl]
[89791.636808] [<ffffffffa030367b>] zio_wait+0x11b/0x200 [zfs]
[89791.636836] [<ffffffffa028ac71>] dsl_pool_sync+0xc1/0x480 [zfs]
[89791.636852] [<ffffffffa01c60af>] ? spl_kmem_cache_free+0x10f/0x4a0 [spl]
[89791.636883] [<ffffffffa02a5b60>] spa_sync+0x480/0xbd0 [zfs]
[89791.636890] [<ffffffff810b2e46>] ? autoremove_wake_function+0x16/0x40
[89791.636921] [<ffffffffa02b730c>] txg_delay+0x4ec/0xa60 [zfs]
[89791.636951] [<ffffffffa02b6f90>] ? txg_delay+0x170/0xa60 [zfs]
[89791.636960] [<ffffffffa01c707a>] __thread_exit+0x9a/0xb0 [spl]
[89791.636969] [<ffffffffa01c7000>] ? __thread_exit+0x20/0xb0 [spl]
[89791.636975] [<ffffffff81090e0a>] kthread+0xea/0x100
[89791.636981] [<ffffffff81090d20>] ? kthread_create_on_node+0x1c0/0x1c0
[89791.636989] [<ffffffff81554bfc>] ret_from_fork+0x7c/0xb0
[89791.636994] [<ffffffff81090d20>] ? kthread_create_on_node+0x1c0/0x1c0
[90105.367103] systemd[1]: systemd-journald.service stop-sigabrt timed out. Terminating.
[90105.490377] systemd[1]: Listening on Journal Audit Socket.
[90105.490436] systemd[1]: Starting Journal Service...
[90151.762415] INFO: task kswapd0:31 blocked for more than 120 seconds.
[90151.762569] Tainted: P O 3.18.2-2-ARCH #1
[90151.762663] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[90151.762795] kswapd0 D 0000000000000000 0 31 2 0x00000000
[90151.762805] ffff8802235a3908 0000000000000046 ffff880224225a90 0000000000013640
[90151.762814] ffff8802235a3fd8 0000000000013640 ffffffff81818540 ffff880224225a90
[90151.762821] ffff88022fc93640 ffff88021cd5e4a0 ffff88021cd5ec02 0000000000000000
[90151.762828] Call Trace:
[90151.762845] [<ffffffff8109e6e7>] ? try_to_wake_up+0x1e7/0x380
[90151.762856] [<ffffffff8109e8e2>] ? default_wake_function+0x12/0x20
[90151.762865] [<ffffffff81550b59>] schedule+0x29/0x70
[90151.762893] [<ffffffffa01cbb8d>] __cv_broadcast+0x12d/0x160 [spl]
[90151.762902] [<ffffffff810b2e30>] ? __wake_up_sync+0x20/0x20
[90151.762915] [<ffffffffa01cbbd5>] __cv_wait+0x15/0x20 [spl]
[90151.762956] [<ffffffffa02b6d03>] txg_wait_open+0x73/0xb0 [zfs]
[90151.762984] [<ffffffffa027514a>] dmu_tx_wait+0x33a/0x350 [zfs]
[90151.763011] [<ffffffffa02751f1>] dmu_tx_assign+0x91/0x4b0 [zfs]
[90151.763040] [<ffffffffa02f2a73>] zfs_inactive+0x163/0x200 [zfs]
[90151.763049] [<ffffffff810b2e70>] ? autoremove_wake_function+0x40/0x40
[90151.763075] [<ffffffffa030af18>] zpl_vap_init+0x838/0xa10 [zfs]
[90151.763083] [<ffffffff811eab68>] evict+0xb8/0x1b0
[90151.763090] [<ffffffff811eaca1>] dispose_list+0x41/0x50
[90151.763097] [<ffffffff811ebce6>] prune_icache_sb+0x56/0x80
[90151.763106] [<ffffffff811d2b25>] super_cache_scan+0x115/0x180
[90151.763115] [<ffffffff81169e89>] shrink_slab_node+0x129/0x2f0
[90151.763123] [<ffffffff8116abcb>] shrink_slab+0x8b/0x160
[90151.763131] [<ffffffff8116e0e9>] kswapd_shrink_zone+0x129/0x1d0
[90151.763138] [<ffffffff8116eb7a>] kswapd+0x54a/0x8f0
[90151.763147] [<ffffffff8116e630>] ? mem_cgroup_shrink_node_zone+0x1c0/0x1c0
[90151.763155] [<ffffffff81090e0a>] kthread+0xea/0x100
[90151.763162] [<ffffffff81090d20>] ? kthread_create_on_node+0x1c0/0x1c0
[90151.763171] [<ffffffff81554bfc>] ret_from_fork+0x7c/0xb0
[90151.763178] [<ffffffff81090d20>] ? kthread_create_on_node+0x1c0/0x1c0
[90151.763190] INFO: task systemd-journal:138 blocked for more than 120 seconds.
[90151.763340] Tainted: P O 3.18.2-2-ARCH #1
[90151.763433] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[90151.763563] systemd-journal D 0000000000000000 0 138 1 0x00000004
[90151.763571] ffff88022310b128 0000000000000082 ffff880222c25080 0000000000013640
[90151.763577] ffff88022310bfd8 0000000000013640 ffff88021cd5e4a0 ffff880222c25080
[90151.763583] ffff88022fc93640 ffff88021cd5e4a0 ffff88021cd5ec02 0000000000000000
[90151.763590] Call Trace:
[90151.763599] [<ffffffff8109e6e7>] ? try_to_wake_up+0x1e7/0x380
[90151.763607] [<ffffffff8109e8e2>] ? default_wake_function+0x12/0x20
[90151.763614] [<ffffffff81550b59>] schedule+0x29/0x70
[90151.763627] [<ffffffffa01cbb8d>] __cv_broadcast+0x12d/0x160 [spl]
[90151.763635] [<ffffffff810b2e30>] ? __wake_up_sync+0x20/0x20
[90151.763647] [<ffffffffa01cbbd5>] __cv_wait+0x15/0x20 [spl]
[90151.763683] [<ffffffffa02b6d03>] txg_wait_open+0x73/0xb0 [zfs]
[90151.763710] [<ffffffffa027514a>] dmu_tx_wait+0x33a/0x350 [zfs]
[90151.763737] [<ffffffffa02751f1>] dmu_tx_assign+0x91/0x4b0 [zfs]
[90151.763765] [<ffffffffa02f2a73>] zfs_inactive+0x163/0x200 [zfs]
[90151.763774] [<ffffffff810b2e70>] ? autoremove_wake_function+0x40/0x40
[90151.763799] [<ffffffffa030af18>] zpl_vap_init+0x838/0xa10 [zfs]
[90151.763805] [<ffffffff811eab68>] evict+0xb8/0x1b0
[90151.763812] [<ffffffff811eaca1>] dispose_list+0x41/0x50
[90151.763819] [<ffffffff811ebce6>] prune_icache_sb+0x56/0x80
[90151.763827] [<ffffffff811d2b25>] super_cache_scan+0x115/0x180
[90151.763834] [<ffffffff81169e89>] shrink_slab_node+0x129/0x2f0
[90151.763842] [<ffffffff811c1523>] ? mem_cgroup_iter+0x2f3/0x4d0
[90151.763850] [<ffffffff8116abcb>] shrink_slab+0x8b/0x160
[90151.763858] [<ffffffff8116da45>] do_try_to_free_pages+0x365/0x4e0
[90151.763866] [<ffffffff8116dc71>] try_to_free_pages+0xb1/0x1a0
[90151.763873] [<ffffffff81160ca7>] __alloc_pages_nodemask+0x697/0xb50
[90151.763884] [<ffffffff811a6e9c>] alloc_pages_current+0x9c/0x120
[90151.763891] [<ffffffff811afba5>] new_slab+0x305/0x370
[90151.763899] [<ffffffff811b2175>] __slab_alloc.isra.51+0x545/0x650
[90151.763907] [<ffffffff81445af9>] ? __alloc_skb+0x89/0x210
[90151.763914] [<ffffffff814380f4>] ? raw_pci_write+0x24/0x50
[90151.763923] [<ffffffff812e8be6>] ? pci_bus_write_config_word+0x66/0x80
[90151.763949] [<ffffffffa004ed7f>] ? ata_bmdma_start+0x2f/0x40 [libata]
[90151.763960] [<ffffffffa01121ad>] ? atiixp_bmdma_start+0x9d/0xe0 [pata_atiixp]
[90151.763969] [<ffffffff811b5445>] __kmalloc_node_track_caller+0xa5/0x240
[90151.763976] [<ffffffff81445af9>] ? __alloc_skb+0x89/0x210
[90151.763984] [<ffffffff81445a11>] __kmalloc_reserve.isra.38+0x31/0x90
[90151.763990] [<ffffffff81445acb>] ? __alloc_skb+0x5b/0x210
[90151.763997] [<ffffffff81445af9>] __alloc_skb+0x89/0x210
[90151.764004] [<ffffffff81445de4>] alloc_skb_with_frags+0x64/0x1e0
[90151.764011] [<ffffffff8143efb9>] sock_alloc_send_pskb+0x219/0x290
[90151.764020] [<ffffffff810136fb>] ? __switch_to+0x1fb/0x600
[90151.764029] [<ffffffff814f886d>] unix_dgram_sendmsg+0x18d/0x690
[90151.764037] [<ffffffff8143ac39>] sock_sendmsg+0x79/0xb0
[90151.764045] [<ffffffff810986da>] ? finish_task_switch+0x4a/0xf0
[90151.764051] [<ffffffff815504c8>] ? __schedule+0x3e8/0xa50
[90151.764059] [<ffffffff8143c76c>] ? move_addr_to_kernel+0x2c/0x50
[90151.764066] [<ffffffff8144a3c7>] ? verify_iovec+0x47/0xd0
[90151.764074] [<ffffffff8143b928>] ___sys_sendmsg+0x408/0x420
[90151.764083] [<ffffffff812149d0>] ? ep_read_events_proc+0xe0/0xe0
[90151.764089] [<ffffffff81440030>] ? sk_prot_alloc.isra.33+0x30/0x130
[90151.764097] [<ffffffff811b291a>] ? kmem_cache_alloc+0x16a/0x170
[90151.764104] [<ffffffff811d100c>] ? get_empty_filp+0x5c/0x1c0
[90151.764112] [<ffffffff8126e496>] ? security_file_alloc+0x16/0x20
[90151.764118] [<ffffffff811d1084>] ? get_empty_filp+0xd4/0x1c0
[90151.764126] [<ffffffff811d118f>] ? alloc_file+0x1f/0xb0
[90151.764134] [<ffffffff8143d651>] __sys_sendmsg+0x51/0x90
[90151.764142] [<ffffffff8143d6a2>] SyS_sendmsg+0x12/0x20
[90151.764149] [<ffffffff81554ca9>] system_call_fastpath+0x12/0x17
[90151.764186] INFO: task txg_sync:583 blocked for more than 120 seconds.
[90151.764325] Tainted: P O 3.18.2-2-ARCH #1
[90151.764418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[90151.764547] txg_sync D 0000000000000000 0 583 2 0x00000000
[90151.764554] ffff88021ce03b18 0000000000000046 ffff88021cd5eeb0 0000000000013640
[90151.764560] ffff88021ce03fd8 0000000000013640 ffff88021f601420 ffff88021cd5eeb0
[90151.764566] ffff88018de384b8 ffffffff00000000 ffff88018de38490 00000000025e92cf
[90151.764572] Call Trace:
[90151.764581] [<ffffffff810b2978>] ? __wake_up+0x48/0x60
[90151.764588] [<ffffffff81550b59>] schedule+0x29/0x70
[90151.764594] [<ffffffff81550e38>] io_schedule+0x98/0x100
[90151.764607] [<ffffffffa01cbb45>] __cv_broadcast+0xe5/0x160 [spl]
[90151.764615] [<ffffffff810b2e30>] ? __wake_up_sync+0x20/0x20
[90151.764627] [<ffffffffa01cbc18>] __cv_wait_io+0x18/0x180 [spl]
[90151.764651] [<ffffffffa030367b>] zio_wait+0x11b/0x200 [zfs]
[90151.764686] [<ffffffffa028ac71>] dsl_pool_sync+0xc1/0x480 [zfs]
[90151.764698] [<ffffffffa01c60af>] ? spl_kmem_cache_free+0x10f/0x4a0 [spl]
[90151.764735] [<ffffffffa02a5b60>] spa_sync+0x480/0xbd0 [zfs]
[90151.764744] [<ffffffff810b2e46>] ? autoremove_wake_function+0x16/0x40
[90151.764781] [<ffffffffa02b730c>] txg_delay+0x4ec/0xa60 [zfs]
[90151.764818] [<ffffffffa02b6f90>] ? txg_delay+0x170/0xa60 [zfs]
[90151.764830] [<ffffffffa01c707a>] __thread_exit+0x9a/0xb0 [spl]
[90151.764841] [<ffffffffa01c7000>] ? __thread_exit+0x20/0xb0 [spl]
[90151.764847] [<ffffffff81090e0a>] kthread+0xea/0x100
[90151.764855] [<ffffffff81090d20>] ? kthread_create_on_node+0x1c0/0x1c0
[90151.764862] [<ffffffff81554bfc>] ret_from_fork+0x7c/0xb0
[90151.764869] [<ffffffff81090d20>] ? kthread_create_on_node+0x1c0/0x1c0
[90195.648689] systemd[1]: systemd-journald.service stop-sigterm timed out. Killing.
[90195.649942] systemd[1]: Starting Journal Service...
This might be caused by #2523, can you verify you have the fix to the SPL applied: zfsonlinux/spl@a3c1eb77721a0d511b4fe7111bb2314686570c4b
More of the same
[ 1320.575152] INFO: task txg_sync:583 blocked for more than 120 seconds.
[ 1320.575263] Tainted: P O 3.18.2-2-ARCH #1
[ 1320.575314] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1320.575384] txg_sync D 0000000000000000 0 583 2 0x00000000
[ 1320.575390] ffff88021ba43b18 0000000000000046 ffff8800cbfc4670 0000000000013640
[ 1320.575395] ffff88021ba43fd8 0000000000013640 ffff88021dee2840 ffff8800cbfc4670
[ 1320.575399] ffff880131dcc138 ffffffff00000000 ffff880131dcc130 00000000b3f1aac7
[ 1320.575402] Call Trace:
[ 1320.575414] [<ffffffff810b2978>] ? __wake_up+0x48/0x60
[ 1320.575420] [<ffffffff81550b59>] schedule+0x29/0x70
[ 1320.575424] [<ffffffff81550e38>] io_schedule+0x98/0x100
[ 1320.575442] [<ffffffffa01cfb45>] __cv_broadcast+0xe5/0x160 [spl]
[ 1320.575447] [<ffffffff810b2e30>] ? __wake_up_sync+0x20/0x20
[ 1320.575453] [<ffffffffa01cfc18>] __cv_wait_io+0x18/0x180 [spl]
[ 1320.575470] [<ffffffffa030767b>] zio_wait+0x11b/0x200 [zfs]
[ 1320.575490] [<ffffffffa028ec71>] dsl_pool_sync+0xc1/0x480 [zfs]
[ 1320.575496] [<ffffffffa01ca0af>] ? spl_kmem_cache_free+0x10f/0x4a0 [spl]
[ 1320.575517] [<ffffffffa02a9b60>] spa_sync+0x480/0xbd0 [zfs]
[ 1320.575522] [<ffffffff810b2e46>] ? autoremove_wake_function+0x16/0x40
[ 1320.575543] [<ffffffffa02bb30c>] txg_delay+0x4ec/0xa60 [zfs]
[ 1320.575563] [<ffffffffa02baf90>] ? txg_delay+0x170/0xa60 [zfs]
[ 1320.575569] [<ffffffffa01cb07a>] __thread_exit+0x9a/0xb0 [spl]
[ 1320.575575] [<ffffffffa01cb000>] ? __thread_exit+0x20/0xb0 [spl]
[ 1320.575579] [<ffffffff81090e0a>] kthread+0xea/0x100
[ 1320.575583] [<ffffffff81090d20>] ? kthread_create_on_node+0x1c0/0x1c0
[ 1320.575588] [<ffffffff81554bfc>] ret_from_fork+0x7c/0xb0
[ 1320.575592] [<ffffffff81090d20>] ? kthread_create_on_node+0x1c0/0x1c0
[ 1560.661841] INFO: task txg_sync:583 blocked for more than 120 seconds.
[ 1560.661995] Tainted: P O 3.18.2-2-ARCH #1
[ 1560.662090] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1560.662221] txg_sync D 0000000000000000 0 583 2 0x00000000
[ 1560.662232] ffff88021ba43b18 0000000000000046 ffff8800cbfc4670 0000000000013640
[ 1560.662240] ffff88021ba43fd8 0000000000013640 ffffffff81818540 ffff8800cbfc4670
[ 1560.662247] ffff880223335eb0 ffff880223335e98 0000000000000000 0000000000000003
[ 1560.662255] Call Trace:
[ 1560.662272] [<ffffffff8109e8e2>] ? default_wake_function+0x12/0x20
[ 1560.662283] [<ffffffff810b2715>] ? __wake_up_common+0x55/0x90
[ 1560.662292] [<ffffffff810b2978>] ? __wake_up+0x48/0x60
[ 1560.662301] [<ffffffff81550b59>] schedule+0x29/0x70
[ 1560.662307] [<ffffffff81550e38>] io_schedule+0x98/0x100
[ 1560.662335] [<ffffffffa01cfb45>] __cv_broadcast+0xe5/0x160 [spl]
[ 1560.662343] [<ffffffff810b2e30>] ? __wake_up_sync+0x20/0x20
[ 1560.662355] [<ffffffffa01cfc18>] __cv_wait_io+0x18/0x180 [spl]
[ 1560.662383] [<ffffffffa030767b>] zio_wait+0x11b/0x200 [zfs]
[ 1560.662418] [<ffffffffa028ec71>] dsl_pool_sync+0xc1/0x480 [zfs]
[ 1560.662430] [<ffffffffa01ca0af>] ? spl_kmem_cache_free+0x10f/0x4a0 [spl]
[ 1560.662468] [<ffffffffa02a9b60>] spa_sync+0x480/0xbd0 [zfs]
[ 1560.662477] [<ffffffff810b2e46>] ? autoremove_wake_function+0x16/0x40
[ 1560.662516] [<ffffffffa02bb30c>] txg_delay+0x4ec/0xa60 [zfs]
[ 1560.662552] [<ffffffffa02baf90>] ? txg_delay+0x170/0xa60 [zfs]
[ 1560.662564] [<ffffffffa01cb07a>] __thread_exit+0x9a/0xb0 [spl]
[ 1560.662575] [<ffffffffa01cb000>] ? __thread_exit+0x20/0xb0 [spl]
[ 1560.662583] [<ffffffff81090e0a>] kthread+0xea/0x100
[ 1560.662590] [<ffffffff81090d20>] ? kthread_create_on_node+0x1c0/0x1c0
[ 1560.662598] [<ffffffff81554bfc>] ret_from_fork+0x7c/0xb0
[ 1560.662605] [<ffffffff81090d20>] ? kthread_create_on_node+0x1c0/0x1c0
[ 1560.662617] INFO: task java:1218 blocked for more than 120 seconds.
[ 1560.662753] Tainted: P O 3.18.2-2-ARCH #1
[ 1560.662846] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1560.662975] java D 0000000000000001 0 1218 655 0x00000000
[ 1560.662983] ffff8800c17f7ac8 0000000000000086 ffff8800c33e1420 0000000000013640
[ 1560.662990] ffff8800c17f7fd8 0000000000013640 ffff8802240ceeb0 ffff8800c33e1420
[ 1560.662997] 0000008000000000 ffff88002041b7a0 0000000000000000 ffff88002041b778
[ 1560.663003] Call Trace:
[ 1560.663027] [<ffffffffa0260398>] ? dbuf_rele_and_unlock+0x2c8/0x4d0 [zfs]
[ 1560.663049] [<ffffffffa0261eca>] ? dbuf_read+0x8da/0xf20 [zfs]
[ 1560.663061] [<ffffffffa01c8901>] ? kmem_asprintf+0x51/0x80 [spl]
[ 1560.663068] [<ffffffff81550b59>] schedule+0x29/0x70
[ 1560.663080] [<ffffffffa01cfb8d>] __cv_broadcast+0x12d/0x160 [spl]
[ 1560.663088] [<ffffffff810b2e30>] ? __wake_up_sync+0x20/0x20
[ 1560.663099] [<ffffffffa01cfbd5>] __cv_wait+0x15/0x20 [spl]
[ 1560.663126] [<ffffffffa0278eab>] dmu_tx_wait+0x9b/0x350 [zfs]
[ 1560.663153] [<ffffffffa02791f1>] dmu_tx_assign+0x91/0x4b0 [zfs]
[ 1560.663161] [<ffffffff810e1006>] ? getrawmonotonic+0x36/0xd0
[ 1560.663186] [<ffffffffa0268e1c>] dmu_free_long_range+0x1ac/0x280 [zfs]
[ 1560.663217] [<ffffffffa02daf1c>] zfs_rmnode+0x6c/0x340 [zfs]
[ 1560.663244] [<ffffffffa02fe141>] zfs_zinactive+0xc1/0x1d0 [zfs]
[ 1560.663273] [<ffffffffa02f6974>] zfs_inactive+0x64/0x200 [zfs]
[ 1560.663281] [<ffffffff810b2e70>] ? autoremove_wake_function+0x40/0x40
[ 1560.663307] [<ffffffffa030ef18>] zpl_vap_init+0x838/0xa10 [zfs]
[ 1560.663315] [<ffffffff811eab68>] evict+0xb8/0x1b0
[ 1560.663322] [<ffffffff811eb405>] iput+0xf5/0x1a0
[ 1560.663330] [<ffffffff811df7f2>] do_unlinkat+0x1e2/0x350
[ 1560.663337] [<ffffffff811d4839>] ? SyS_newstat+0x39/0x60
[ 1560.663345] [<ffffffff811e02c6>] SyS_unlink+0x16/0x20
[ 1560.663353] [<ffffffff81554ca9>] system_call_fastpath+0x12/0x17
[ 1560.663371] INFO: task sshd:1381 blocked for more than 120 seconds.
[ 1560.663508] Tainted: P O 3.18.2-2-ARCH #1
[ 1560.663601] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1560.663791] sshd D 0000000000000001 0 1381 764 0x00000004
[ 1560.663798] ffff880113ad7b38 0000000000000086 ffff8800a7356eb0 0000000000013640
[ 1560.663804] ffff880113ad7fd8 0000000000013640 ffff8802240ceeb0 ffff8800a7356eb0
[ 1560.663810] ffff8800c9ccabb8 ffff8800c9ccabb8 0000000000000280 0000000000000001
[ 1560.663816] Call Trace:
[ 1560.663825] [<ffffffff81550b59>] schedule+0x29/0x70
[ 1560.663839] [<ffffffffa01cfb8d>] __cv_broadcast+0x12d/0x160 [spl]
[ 1560.663847] [<ffffffff810b2e30>] ? __wake_up_sync+0x20/0x20
[ 1560.663858] [<ffffffffa01cfbd5>] __cv_wait+0x15/0x20 [spl]
[ 1560.663885] [<ffffffffa0278eab>] dmu_tx_wait+0x9b/0x350 [zfs]
[ 1560.663912] [<ffffffffa02791f1>] dmu_tx_assign+0x91/0x4b0 [zfs]
[ 1560.663940] [<ffffffffa02f66d7>] zfs_dirty_inode+0xf7/0x330 [zfs]
[ 1560.663951] [<ffffffffa01c8956>] ? kmem_free_debug+0x16/0x20 [spl]
[ 1560.663962] [<ffffffffa01d0b10>] ? crfree+0x170/0x180 [spl]
[ 1560.663972] [<ffffffffa01d123d>] ? tsd_exit+0x19d/0x1b0 [spl]
[ 1560.663998] [<ffffffffa02fe288>] ? zfs_tstamp_update_setup+0x38/0x1c0 [zfs]
[ 1560.664026] [<ffffffffa02f0cfe>] ? zfs_read+0x39e/0x460 [zfs]
[ 1560.664049] [<ffffffffa030ef2e>] zpl_vap_init+0x84e/0xa10 [zfs]
[ 1560.664056] [<ffffffff811fb0f8>] __mark_inode_dirty+0x38/0x2d0
[ 1560.664082] [<ffffffffa02fb3cd>] zfs_mark_inode_dirty+0x4d/0x60 [zfs]
[ 1560.664106] [<ffffffffa030d626>] zpl_putpage+0x576/0xd50 [zfs]
[ 1560.664114] [<ffffffff811d0d3c>] __fput+0x9c/0x200
[ 1560.664122] [<ffffffff811d0eee>] ____fput+0xe/0x10
[ 1560.664128] [<ffffffff8108f33f>] task_work_run+0x9f/0xe0
[ 1560.664137] [<ffffffff81014e75>] do_notify_resume+0x95/0xa0
[ 1560.664145] [<ffffffff81554f20>] int_signal+0x12/0x17
[ 1680.702704] INFO: task txg_sync:583 blocked for more than 120 seconds.
[ 1680.702864] Tainted: P O 3.18.2-2-ARCH #1
[ 1680.702958] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1680.703089] txg_sync D 0000000000000000 0 583 2 0x00000000
[ 1680.703100] ffff88021ba43b18 0000000000000046 ffff8800cbfc4670 0000000000013640
[ 1680.703108] ffff88021ba43fd8 0000000000013640 ffffffff81818540 ffff8800cbfc4670
[ 1680.703116] ffff880223335eb0 ffff880223335e98 0000000000000000 0000000000000003
[ 1680.703123] Call Trace:
[ 1680.703141] [<ffffffff8109e8e2>] ? default_wake_function+0x12/0x20
[ 1680.703152] [<ffffffff810b2715>] ? __wake_up_common+0x55/0x90
[ 1680.703161] [<ffffffff810b2978>] ? __wake_up+0x48/0x60
[ 1680.703169] [<ffffffff81550b59>] schedule+0x29/0x70
[ 1680.703176] [<ffffffff81550e38>] io_schedule+0x98/0x100
[ 1680.703203] [<ffffffffa01cfb45>] __cv_broadcast+0xe5/0x160 [spl]
[ 1680.703211] [<ffffffff810b2e30>] ? __wake_up_sync+0x20/0x20
[ 1680.703224] [<ffffffffa01cfc18>] __cv_wait_io+0x18/0x180 [spl]
[ 1680.703252] [<ffffffffa030767b>] zio_wait+0x11b/0x200 [zfs]
[ 1680.703287] [<ffffffffa028ec71>] dsl_pool_sync+0xc1/0x480 [zfs]
[ 1680.703299] [<ffffffffa01ca0af>] ? spl_kmem_cache_free+0x10f/0x4a0 [spl]
[ 1680.703338] [<ffffffffa02a9b60>] spa_sync+0x480/0xbd0 [zfs]
[ 1680.703346] [<ffffffff810b2e46>] ? autoremove_wake_function+0x16/0x40
[ 1680.703385] [<ffffffffa02bb30c>] txg_delay+0x4ec/0xa60 [zfs]
[ 1680.703422] [<ffffffffa02baf90>] ? txg_delay+0x170/0xa60 [zfs]
[ 1680.703433] [<ffffffffa01cb07a>] __thread_exit+0x9a/0xb0 [spl]
[ 1680.703444] [<ffffffffa01cb000>] ? __thread_exit+0x20/0xb0 [spl]
[ 1680.703452] [<ffffffff81090e0a>] kthread+0xea/0x100
[ 1680.703459] [<ffffffff81090d20>] ? kthread_create_on_node+0x1c0/0x1c0
[ 1680.703467] [<ffffffff81554bfc>] ret_from_fork+0x7c/0xb0
[ 1680.703474] [<ffffffff81090d20>] ? kthread_create_on_node+0x1c0/0x1c0
[ 1800.742712] INFO: task txg_sync:583 blocked for more than 120 seconds.
[ 1800.742869] Tainted: P O 3.18.2-2-ARCH #1
[ 1800.742964] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1800.743095] txg_sync D 0000000000000000 0 583 2 0x00000000
[ 1800.743106] ffff88021ba43b18 0000000000000046 ffff8800cbfc4670 0000000000013640
[ 1800.743114] ffff88021ba43fd8 0000000000013640 ffffffff81818540 ffff8800cbfc4670
[ 1800.743121] ffff880223335eb0 ffff880223335e98 0000000000000000 0000000000000003
[ 1800.743129] Call Trace:
[ 1800.743146] [<ffffffff8109e8e2>] ? default_wake_function+0x12/0x20
[ 1800.743157] [<ffffffff810b2715>] ? __wake_up_common+0x55/0x90
[ 1800.743165] [<ffffffff810b2978>] ? __wake_up+0x48/0x60
[ 1800.743174] [<ffffffff81550b59>] schedule+0x29/0x70
[ 1800.743180] [<ffffffff81550e38>] io_schedule+0x98/0x100
[ 1800.743206] [<ffffffffa01cfb45>] __cv_broadcast+0xe5/0x160 [spl]
[ 1800.743214] [<ffffffff810b2e30>] ? __wake_up_sync+0x20/0x20
[ 1800.743227] [<ffffffffa01cfc18>] __cv_wait_io+0x18/0x180 [spl]
[ 1800.743256] [<ffffffffa030767b>] zio_wait+0x11b/0x200 [zfs]
[ 1800.743290] [<ffffffffa028ec71>] dsl_pool_sync+0xc1/0x480 [zfs]
[ 1800.743302] [<ffffffffa01ca0af>] ? spl_kmem_cache_free+0x10f/0x4a0 [spl]
[ 1800.743341] [<ffffffffa02a9b60>] spa_sync+0x480/0xbd0 [zfs]
[ 1800.743349] [<ffffffff810b2e46>] ? autoremove_wake_function+0x16/0x40
[ 1800.743388] [<ffffffffa02bb30c>] txg_delay+0x4ec/0xa60 [zfs]
[ 1800.743425] [<ffffffffa02baf90>] ? txg_delay+0x170/0xa60 [zfs]
[ 1800.743437] [<ffffffffa01cb07a>] __thread_exit+0x9a/0xb0 [spl]
[ 1800.743448] [<ffffffffa01cb000>] ? __thread_exit+0x20/0xb0 [spl]
[ 1800.743455] [<ffffffff81090e0a>] kthread+0xea/0x100
[ 1800.743462] [<ffffffff81090d20>] ? kthread_create_on_node+0x1c0/0x1c0
[ 1800.743471] [<ffffffff81554bfc>] ret_from_fork+0x7c/0xb0
[ 1800.743477] [<ffffffff81090d20>] ? kthread_create_on_node+0x1c0/0x1c0
[ 1920.782195] INFO: task txg_sync:583 blocked for more than 120 seconds.
[ 1920.782354] Tainted: P O 3.18.2-2-ARCH #1
[ 1920.782449] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1920.782580] txg_sync D 0000000000000000 0 583 2 0x00000000
[ 1920.782590] ffff88021ba43b18 0000000000000046 ffff8800cbfc4670 0000000000013640
[ 1920.782599] ffff88021ba43fd8 0000000000013640 ffffffff81818540 ffff8800cbfc4670
[ 1920.782606] ffff880223335eb0 ffff880223335e98 0000000000000000 0000000000000003
[ 1920.782614] Call Trace:
[ 1920.782631] [<ffffffff8109e8e2>] ? default_wake_function+0x12/0x20
[ 1920.782641] [<ffffffff810b2715>] ? __wake_up_common+0x55/0x90
[ 1920.782650] [<ffffffff810b2978>] ? __wake_up+0x48/0x60
[ 1920.782658] [<ffffffff81550b59>] schedule+0x29/0x70
[ 1920.782665] [<ffffffff81550e38>] io_schedule+0x98/0x100
[ 1920.782693] [<ffffffffa01cfb45>] __cv_broadcast+0xe5/0x160 [spl]
[ 1920.782701] [<ffffffff810b2e30>] ? __wake_up_sync+0x20/0x20
[ 1920.782714] [<ffffffffa01cfc18>] __cv_wait_io+0x18/0x180 [spl]
[ 1920.782742] [<ffffffffa030767b>] zio_wait+0x11b/0x200 [zfs]
[ 1920.782777] [<ffffffffa028ec71>] dsl_pool_sync+0xc1/0x480 [zfs]
[ 1920.782789] [<ffffffffa01ca0af>] ? spl_kmem_cache_free+0x10f/0x4a0 [spl]
[ 1920.782827] [<ffffffffa02a9b60>] spa_sync+0x480/0xbd0 [zfs]
[ 1920.782836] [<ffffffff810b2e46>] ? autoremove_wake_function+0x16/0x40
[ 1920.782874] [<ffffffffa02bb30c>] txg_delay+0x4ec/0xa60 [zfs]
[ 1920.782911] [<ffffffffa02baf90>] ? txg_delay+0x170/0xa60 [zfs]
[ 1920.782923] [<ffffffffa01cb07a>] __thread_exit+0x9a/0xb0 [spl]
[ 1920.782934] [<ffffffffa01cb000>] ? __thread_exit+0x20/0xb0 [spl]
[ 1920.782942] [<ffffffff81090e0a>] kthread+0xea/0x100
[ 1920.782949] [<ffffffff81090d20>] ? kthread_create_on_node+0x1c0/0x1c0
[ 1920.782957] [<ffffffff81554bfc>] ret_from_fork+0x7c/0xb0
[ 1920.782964] [<ffffffff81090d20>] ? kthread_create_on_node+0x1c0/0x1c0
[ 1920.782976] INFO: task java:1218 blocked for more than 120 seconds.
[ 1920.783136] Tainted: P O 3.18.2-2-ARCH #1
[ 1920.783236] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1920.783366] java D 0000000000000000 0 1218 655 0x00000000
[ 1920.783373] ffff8800c17f7ab8 0000000000000086 ffff8800c33e1420 0000000000013640
[ 1920.783381] ffff8800c17f7fd8 0000000000013640 ffffffff81818540 ffff8800c33e1420
[ 1920.783387] ffff8800c17f7a08 ffffffffa025a872 0000000000000000 ffff8800235be4a0
[ 1920.783393] Call Trace:
[ 1920.783416] [<ffffffffa025a872>] ? arc_buf_eviction_needed+0x82/0xc0 [zfs]
[ 1920.783439] [<ffffffffa0260398>] ? dbuf_rele_and_unlock+0x2c8/0x4d0 [zfs]
[ 1920.783476] [<ffffffffa02b634d>] ? bp_get_dsize+0xad/0xf0 [zfs]
[ 1920.783503] [<ffffffffa02770e4>] ? dmu_tx_callback_register+0x324/0xab0 [zfs]
[ 1920.783512] [<ffffffff81550b59>] schedule+0x29/0x70
[ 1920.783524] [<ffffffffa01cfb8d>] __cv_broadcast+0x12d/0x160 [spl]
[ 1920.783532] [<ffffffff810b2e30>] ? __wake_up_sync+0x20/0x20
[ 1920.783544] [<ffffffffa01cfbd5>] __cv_wait+0x15/0x20 [spl]
[ 1920.783570] [<ffffffffa0278eab>] dmu_tx_wait+0x9b/0x350 [zfs]
[ 1920.783597] [<ffffffffa02791f1>] dmu_tx_assign+0x91/0x4b0 [zfs]
[ 1920.783627] [<ffffffffa02810e5>] ? dsl_dataset_block_freeable+0x45/0x1d0 [zfs]
[ 1920.783653] [<ffffffffa0276fb9>] ? dmu_tx_callback_register+0x1f9/0xab0 [zfs]
[ 1920.783681] [<ffffffffa02f74a0>] zfs_write+0x3c0/0xbf0 [zfs]
[ 1920.783688] [<ffffffff810a9dee>] ? enqueue_entity+0x24e/0xaa0
[ 1920.783696] [<ffffffff8109a1f0>] ? resched_curr+0xd0/0xe0
[ 1920.783705] [<ffffffff810ec497>] ? wake_futex+0x67/0x90
[ 1920.783711] [<ffffffff810ef856>] ? do_futex+0x8f6/0xae0
[ 1920.783736] [<ffffffffa030d2cb>] zpl_putpage+0x21b/0xd50 [zfs]
[ 1920.783744] [<ffffffff811cf1d7>] vfs_write+0xb7/0x200
[ 1920.783752] [<ffffffff811cfd29>] SyS_write+0x59/0xd0
[ 1920.783760] [<ffffffff81554ca9>] system_call_fastpath+0x12/0x17
[ 1920.783778] INFO: task imap:1389 blocked for more than 120 seconds.
[ 1920.783916] Tainted: P O 3.18.2-2-ARCH #1
[ 1920.784009] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1920.784138] imap D 0000000000000000 0 1389 770 0x00000000
[ 1920.784145] ffff88000f97fb38 0000000000000082 ffff880222d3bc60 0000000000013640
[ 1920.784151] ffff88000f97ffd8 0000000000013640 ffff88009d290a10 ffff880222d3bc60
[ 1920.784161] ffff8800c9ccabb8 ffff8800c9ccabb8 0000000000000fd8 0000000000000001
[ 1920.784168] Call Trace:
[ 1920.784176] [<ffffffff81550b59>] schedule+0x29/0x70
[ 1920.784190] [<ffffffffa01cfb8d>] __cv_broadcast+0x12d/0x160 [spl]
[ 1920.784197] [<ffffffff810b2e30>] ? __wake_up_sync+0x20/0x20
[ 1920.784209] [<ffffffffa01cfbd5>] __cv_wait+0x15/0x20 [spl]
[ 1920.784236] [<ffffffffa0278eab>] dmu_tx_wait+0x9b/0x350 [zfs]
[ 1920.784263] [<ffffffffa02791f1>] dmu_tx_assign+0x91/0x4b0 [zfs]
[ 1920.784292] [<ffffffffa02f66d7>] zfs_dirty_inode+0xf7/0x330 [zfs]
[ 1920.784303] [<ffffffffa01d0b10>] ? crfree+0x170/0x180 [spl]
[ 1920.784314] [<ffffffffa01d123d>] ? tsd_exit+0x19d/0x1b0 [spl]
[ 1920.784338] [<ffffffffa030ef2e>] zpl_vap_init+0x84e/0xa10 [zfs]
[ 1920.784345] [<ffffffff811fb0f8>] __mark_inode_dirty+0x38/0x2d0
[ 1920.784372] [<ffffffffa02fb3cd>] zfs_mark_inode_dirty+0x4d/0x60 [zfs]
[ 1920.784395] [<ffffffffa030d626>] zpl_putpage+0x576/0xd50 [zfs]
[ 1920.784403] [<ffffffff811d0d3c>] __fput+0x9c/0x200
[ 1920.784411] [<ffffffff811d0eee>] ____fput+0xe/0x10
[ 1920.784417] [<ffffffff8108f33f>] task_work_run+0x9f/0xe0
[ 1920.784426] [<ffffffff81014e75>] do_notify_resume+0x95/0xa0
[ 1920.784434] [<ffffffff81554f20>] int_signal+0x12/0x17
[ 2280.901736] INFO: task txg_sync:583 blocked for more than 120 seconds.
[ 2280.901897] Tainted: P O 3.18.2-2-ARCH #1
[ 2280.901992] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2280.902124] txg_sync D 0000000000000000 0 583 2 0x00000000
[ 2280.902134] ffff88021ba43b18 0000000000000046 ffff8800cbfc4670 0000000000013640
[ 2280.902143] ffff88021ba43fd8 0000000000013640 ffffffff81818540 ffff8800cbfc4670
[ 2280.902150] ffff880223335eb0 ffff880223335e98 0000000000000000 0000000000000003
[ 2280.902157] Call Trace:
[ 2280.902175] [<ffffffff8109e8e2>] ? default_wake_function+0x12/0x20
[ 2280.902186] [<ffffffff810b2715>] ? __wake_up_common+0x55/0x90
[ 2280.902194] [<ffffffff810b2978>] ? __wake_up+0x48/0x60
[ 2280.902203] [<ffffffff81550b59>] schedule+0x29/0x70
[ 2280.902210] [<ffffffff81550e38>] io_schedule+0x98/0x100
[ 2280.902237] [<ffffffffa01cfb45>] __cv_broadcast+0xe5/0x160 [spl]
[ 2280.902245] [<ffffffff810b2e30>] ? __wake_up_sync+0x20/0x20
[ 2280.902258] [<ffffffffa01cfc18>] __cv_wait_io+0x18/0x180 [spl]
[ 2280.902286] [<ffffffffa030767b>] zio_wait+0x11b/0x200 [zfs]
[ 2280.902321] [<ffffffffa028ec71>] dsl_pool_sync+0xc1/0x480 [zfs]
[ 2280.902333] [<ffffffffa01ca0af>] ? spl_kmem_cache_free+0x10f/0x4a0 [spl]
[ 2280.902371] [<ffffffffa02a9b60>] spa_sync+0x480/0xbd0 [zfs]
[ 2280.902380] [<ffffffff810b2e46>] ? autoremove_wake_function+0x16/0x40
[ 2280.902419] [<ffffffffa02bb30c>] txg_delay+0x4ec/0xa60 [zfs]
[ 2280.902455] [<ffffffffa02baf90>] ? txg_delay+0x170/0xa60 [zfs]
[ 2280.902467] [<ffffffffa01cb07a>] __thread_exit+0x9a/0xb0 [spl]
[ 2280.902478] [<ffffffffa01cb000>] ? __thread_exit+0x20/0xb0 [spl]
[ 2280.902486] [<ffffffff81090e0a>] kthread+0xea/0x100
[ 2280.902493] [<ffffffff81090d20>] ? kthread_create_on_node+0x1c0/0x1c0
[ 2280.902502] [<ffffffff81554bfc>] ret_from_fork+0x7c/0xb0
[ 2280.902508] [<ffffffff81090d20>] ? kthread_create_on_node+0x1c0/0x1c0
[38275.738573] perf interrupt took too long (2506 > 2495), lowering kernel.perf_event_max_sample_rate to 50100
Running spl versions from arch:
hinoki% pacman -Q | grep spl
spl-git 0.6.3_r54_g03a7835_3.18.2_2-1
spl-utils-git 0.6.3_r54_g03a7835_3.18.2_2-1
Free memory:
hinoki% free -h
total used free shared buff/cache available
Mem: 7.8G 3.6G 691M 1.0M 3.6G 872M
Swap: 4.0G 0B 4.0G
I'm getting this every couple of days now since upgrading to the latest Debian build (which according to the linked bug has the fix in it). Oops messages more or less the same as those already posted.. The zvols lock up and load average climbs into the hundreds. I've never had an issue prior to this.
I'm seeing similar symptoms after running a zfs rollback
on a SMB shared dataset. After seeing load quickly rise I've stopped the SMB service and will give the rollback command some time to see if it recovers.
From dmesg, on Ubuntu Server 14.04.2 LTS, ZoL 0.6.3-5 from the Ubuntu PPA:
[3764100.795021] smbd D ffff88010b433480 0 34303 60686 0x00000000
[3764100.795025] ffff8800ae491d68 0000000000000082 ffff8800cf951800 ffff8800ae491fd8
[3764100.795029] 0000000000013480 0000000000013480 ffff8800cf951800 ffff8800e48dc3d0
[3764100.795032] ffff8800e48dc3a0 ffff8800e48dc3d8 ffff8800e48dc3c8 0000000000000000
[3764100.795037] Call Trace:
[3764100.795052] [<ffffffff817251a9>] schedule+0x29/0x70
[3764100.795067] [<ffffffffa006b7b5>] cv_wait_common+0x125/0x1c0 [spl]
[3764100.795073] [<ffffffff810ab0b0>] ? prepare_to_wait_event+0x100/0x100
[3764100.795091] [<ffffffffa006b865>] __cv_wait+0x15/0x20 [spl]
[3764100.795130] [<ffffffffa017019b>] rrw_enter_read+0x3b/0x150 [zfs]
[3764100.795181] [<ffffffffa01bf65d>] zfs_getattr_fast+0x3d/0x180 [zfs]
[3764100.795230] [<ffffffffa01d81fd>] zpl_getattr+0x2d/0x50 [zfs]
[3764100.795234] [<ffffffff811c2829>] vfs_getattr_nosec+0x29/0x40
[3764100.795237] [<ffffffff811c28fd>] vfs_getattr+0x2d/0x40
[3764100.795240] [<ffffffff811c29d2>] vfs_fstatat+0x62/0xa0
[3764100.795244] [<ffffffff811c2e5f>] SYSC_newstat+0x1f/0x40
[3764100.795248] [<ffffffff811cdc99>] ? putname+0x29/0x40
[3764100.795252] [<ffffffff811bcfe8>] ? do_sys_open+0x1b8/0x280
[3764100.795256] [<ffffffff811c30ae>] SyS_newstat+0xe/0x10
[3764100.795260] [<ffffffff8173186d>] system_call_fastpath+0x1a/0x1f
[3764100.795263] INFO: task smbd:34313 blocked for more than 120 seconds.
I can trigger the same error by using rsync from two different pools. The rsync process is hanging and can not be killed.
If I won't stop the rsync from the cli, the
May 07 22:43:27 <my host> kernel: INFO: task txg_sync:1408 blocked for more than 120 seconds.
May 07 22:43:27 <my host> kernel: Tainted: P O 4.0.1-1-ARCH #1
May 07 22:43:27 <my host> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 07 22:43:27 <my host> kernel: txg_sync D ffff880028b63a28 0 1408 2 0x00000000
May 07 22:43:27 <my host> kernel: ffff880028b63a28 ffff88018c5f3cc0 ffff880070a16540 0000000000000000
May 07 22:43:27 <my host> kernel: ffff880028b63fd8 ffff88021fc93e00 7fffffffffffffff ffff8801b1ccae08
May 07 22:43:27 <my host> kernel: 0000000000000001 ffff880028b63a48 ffffffff8156fa87 ffff88008e7bdcb0
May 07 22:43:27 <my host> kernel: Call Trace:
May 07 22:43:27 <my host> kernel: [<ffffffff8156fa87>] schedule+0x37/0x90
May 07 22:43:27 <my host> kernel: [<ffffffff8157246c>] schedule_timeout+0x1bc/0x250
May 07 22:43:27 <my host> kernel: [<ffffffff8101f599>] ? read_tsc+0x9/0x10
May 07 22:43:27 <my host> kernel: [<ffffffff810e6757>] ? ktime_get+0x37/0xb0
May 07 22:43:27 <my host> kernel: [<ffffffff8156ef9a>] io_schedule_timeout+0xaa/0x130
May 07 22:43:27 <my host> kernel: [<ffffffffa034daa0>] ? zio_taskq_member.isra.6+0x80/0x80 [zfs]
May 07 22:43:27 <my host> kernel: [<ffffffffa0192248>] cv_wait_common+0xb8/0x140 [spl]
May 07 22:43:27 <my host> kernel: [<ffffffff810b6b20>] ? wake_atomic_t_function+0x60/0x60
May 07 22:43:27 <my host> kernel: [<ffffffffa0192328>] __cv_wait_io+0x18/0x20 [spl]
May 07 22:43:27 <my host> kernel: [<ffffffffa034f943>] zio_wait+0x123/0x210 [zfs]
May 07 22:43:27 <my host> kernel: [<ffffffffa02d6be1>] dsl_pool_sync+0xc1/0x480 [zfs]
May 07 22:43:27 <my host> kernel: [<ffffffffa02f1f80>] spa_sync+0x480/0xbf0 [zfs]
May 07 22:43:27 <my host> kernel: [<ffffffff810b6b36>] ? autoremove_wake_function+0x16/0x40
May 07 22:43:27 <my host> kernel: [<ffffffffa0303e06>] txg_sync_thread+0x386/0x630 [zfs]
May 07 22:43:27 <my host> kernel: [<ffffffff8156fd62>] ? preempt_schedule_common+0x22/0x40
May 07 22:43:27 <my host> kernel: [<ffffffffa0303a80>] ? txg_quiesce_thread+0x3a0/0x3a0 [zfs]
May 07 22:43:27 <my host> kernel: [<ffffffffa018d561>] thread_generic_wrapper+0x71/0x80 [spl]
May 07 22:43:27 <my host> kernel: [<ffffffffa018d4f0>] ? __thread_exit+0x20/0x20 [spl]
May 07 22:43:27 <my host> kernel: [<ffffffff81093338>] kthread+0xd8/0xf0
May 07 22:43:27 <my host> kernel: [<ffffffff81093260>] ? kthread_worker_fn+0x170/0x170
May 07 22:43:27 <my host> kernel: [<ffffffff81573718>] ret_from_fork+0x58/0x90
May 07 22:43:27 <my host> kernel: [<ffffffff81093260>] ? kthread_worker_fn+0x170/0x170
May 07 22:43:27 <my host> kernel: INFO: task rsync:10064 blocked for more than 120 seconds.
May 07 22:43:27 <my host> kernel: Tainted: P O 4.0.1-1-ARCH #1
May 07 22:43:27 <my host> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 07 22:43:27 <my host> kernel: rsync D ffff8801a4633a58 0 10064 1 0x00000004
May 07 22:43:27 <my host> kernel: ffff8801a4633a58 ffff880216186f60 ffff880070a56540 ffff8801a4633a68
May 07 22:43:27 <my host> kernel: ffff8801a4633fd8 ffff8800da2a4a20 ffff8800da2a4ae0 ffff8800da2a4a48
May 07 22:43:27 <my host> kernel: 0000000000000000 ffff8801a4633a78 ffffffff8156fa87 ffff8800da2a4a20
May 07 22:43:27 <my host> kernel: Call Trace:
May 07 22:43:27 <my host> kernel: [<ffffffff8156fa87>] schedule+0x37/0x90
May 07 22:43:27 <my host> kernel: [<ffffffffa019229d>] cv_wait_common+0x10d/0x140 [spl]
May 07 22:43:27 <my host> kernel: [<ffffffff810b6b20>] ? wake_atomic_t_function+0x60/0x60
May 07 22:43:27 <my host> kernel: [<ffffffffa01922e5>] __cv_wait+0x15/0x20 [spl]
May 07 22:43:27 <my host> kernel: [<ffffffffa030327b>] txg_wait_synced+0x8b/0xd0 [zfs]
May 07 22:43:27 <my host> kernel: [<ffffffffa02c078c>] dmu_tx_wait+0x25c/0x3a0 [zfs]
May 07 22:43:27 <my host> kernel: [<ffffffffa02c096e>] dmu_tx_assign+0x9e/0x520 [zfs]
May 07 22:43:27 <my host> kernel: [<ffffffffa02c8e10>] ? dsl_dataset_block_freeable+0x20/0x70 [zfs]
May 07 22:43:27 <my host> kernel: [<ffffffffa02be639>] ? dmu_tx_count_dnode+0x59/0xb0 [zfs]
May 07 22:43:27 <my host> kernel: [<ffffffffa033f5de>] zfs_write+0x3ce/0xc50 [zfs]
May 07 22:43:27 <my host> kernel: [<ffffffff81515821>] ? unix_stream_recvmsg+0x701/0x7e0
May 07 22:43:27 <my host> kernel: [<ffffffffa03558cd>] zpl_write+0xbd/0x130 [zfs]
May 07 22:43:27 <my host> kernel: [<ffffffff811d56b3>] vfs_write+0xb3/0x200
May 07 22:43:27 <my host> kernel: [<ffffffff811d624c>] ? vfs_read+0x11c/0x140
May 07 22:43:27 <my host> kernel: [<ffffffff811d6399>] SyS_write+0x59/0xd0
May 07 22:43:27 <my host> kernel: [<ffffffff815737c9>] system_call_fastpath+0x12/0x17
May 07 22:44:27 <my host> kernel: WARNING: Pool '<my target pool>' has encountered an uncorrectable I/O failure and has been suspended.
May 07 22:44:27 <my host> zed[12566]: eid=2302 class=data pool=<my target pool>
May 07 22:44:27 <my host> zed[12569]: eid=2303 class=io_failure pool=<my target pool>
# zpool status -v
pool: <my target pool>
state: ONLINE
scan: scrub repaired 0 in 9h15m with 0 errors on Thu May 7 09:01:37 2015
config:
NAME STATE READ WRITE CKSUM
<my target pool> ONLINE 0 0 0
<my target pool>-crypt ONLINE 0 0 0
errors: No known data errors
pool: <my source pool>
state: ONLINE
scan: scrub repaired 0 in 6h3m with 0 errors on Thu May 7 05:49:33 2015
config:
NAME STATE READ WRITE CKSUM
<my source pool> ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
crypt-<my source pool>-00 ONLINE 0 0 0
crypt-<my source pool>-01 ONLINE 0 0 0
errors: No known data errors
#modinfo zfs | head
filename: /lib/modules/4.0.1-1-ARCH/extra/zfs/zfs.ko.gz
version: 0.6.4.1-1
license: CDDL
author: OpenZFS on Linux
description: ZFS
srcversion: 8324F6AEA2A06B2B6F0A0F5
depends: spl,znvpair,zunicode,zcommon,zavl
vermagic: 4.0.1-1-ARCH SMP preempt mod_unload modversions
#modinfo spl | head
filename: /lib/modules/4.0.1-1-ARCH/extra/spl/spl.ko.gz
version: 0.6.4.1-1
license: GPL
author: OpenZFS on Linux
description: Solaris Porting Layer
srcversion: 8907748310B8940C9D0DCD2
depends:
vermagic: 4.0.1-1-ARCH SMP preempt mod_unload modversions
#uname -a
Linux <my host> 4.0.1-1-ARCH #1 SMP PREEMPT Wed Apr 29 12:00:26 CEST 2015 x86_64 GNU/Linux
Fingers crossed I've provided good information. I'm running an arch linux with demz repo.
I got the same issue with high-loaded mongodb on ZFS.
free -m
total used free shared buffers cached
Mem: 7928 7422 506 76 192 1024
-/+ buffers/cache: 6205 1723
Swap: 0 0 0
# modinfo spl|head
filename: /lib/modules/4.0.0-1-amd64/updates/dkms/spl.ko
version: 0.6.4-2-62e2eb
license: GPL
author: OpenZFS on Linux
description: Solaris Porting Layer
srcversion: 3F1EAF06925B312A0B3F767
depends:
vermagic: 4.0.0-1-amd64 SMP mod_unload modversions
parm: spl_hostid:The system hostid. (ulong)
parm: spl_hostid_path:The system hostid file (/etc/hostid) (charp)
# modinfo zfs|head
filename: /lib/modules/4.0.0-1-amd64/updates/dkms/zfs.ko
version: 0.6.4-16-544f71
license: CDDL
author: OpenZFS on Linux
description: ZFS
srcversion: 29BA21B62706579B75D5974
depends: spl,znvpair,zunicode,zcommon,zavl
vermagic: 4.0.0-1-amd64 SMP mod_unload modversions
parm: zvol_inhibit_dev:Do not create zvol device nodes (uint)
parm: zvol_major:Major number for zvol device (uint)
# uname -a
Linux dan-desktop 4.0.0-1-amd64 #1 SMP Debian 4.0.2-1 (2015-05-11) x86_64 GNU/Linux
# lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux unstable (sid)
Release: unstable
Codename: sid
Same for me on Debian Jessie with Linux 3.16.0-4-amd64 and zfs 0.6.5.2-2. Happened on a postgresql server, no load at that time.
Jan 22 07:04:42 db04 kernel: [5056080.684110] INFO: task txg_sync:378 blocked for more than 120 seconds.
Jan 22 07:04:43 db04 kernel: [5056080.684128] Tainted: P O 3.16.0-4-amd64 #1
Jan 22 07:04:43 db04 kernel: [5056080.684134] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 22 07:04:43 db04 kernel: [5056080.684142] txg_sync D ffff8800f96bc7e8 0 378 2 0x00000000
Jan 22 07:04:43 db04 kernel: [5056080.684153] ffff8800f96bc390 0000000000000246 0000000000012f00 ffff8800f707ffd8
Jan 22 07:04:43 db04 kernel: [5056080.684165] 0000000000012f00 ffff8800f96bc390 ffff8800ff3137b0 ffff880048db7570
Jan 22 07:04:43 db04 kernel: [5056080.684176] ffff880048db75b0 0000000000000001 ffff8800f8eb9000 0000000000000000
Jan 22 07:04:43 db04 kernel: [5056080.684187] Call Trace:
Jan 22 07:04:43 db04 kernel: [5056080.684201] [
@andreas-p please update to 0.6.5.4* if available
alternatively: you can build your own latest zfsonlinux packages:
Unfortunately, debian8 is DKMS still on 6.5.2, no update in the last three months. Any clue when this gets resolved?
@andreas-p sorry, no idea,
but there's always the option of building the packages on your own - which is some effort but you'll know that you can trust those instead of having to rely on third-party repositories, etc.
Got the very same problem with 0.6.5.4 on a different machine with Debian8, zfs built from source. The stack trace shows exactly the same positions as the trace from Jan 18. Virtual machine: 6GB RAM, 1.5GB swap, 4 AMD CPU cores.
12:54 Starting an rsync from a 2.5TB xfs to 4TB zfs partition, memory rising from 2GB to 5GB within 5 minutes. 13:09 CPU load/1m steps up from 1 to 10, CPU utilization around 50%, mostly system 13:11 txg_sync hung_timeout first appearance, CPU load/1m steps to 16, committed memory drops to 0, CPU utilization 85% system. 13:15 CPU utilization 95% system, load 16, need reboot.
same problem here with Debian 8, ZoL 6.5.2 from the official package repository.
Exactly the same block with 0.6.5.5:
Mar 24 11:13:06 db04 kernel: [4746960.372343] [
I got two consecutive "txg_sync blocked for more than 120 seconds", then it went back to normal.
I'm suffering from the similar problem. rsync
reads/writes are extremely slow ~3M. After struggling like this ~24h (it's a multi-million file dataset) machine gets bricked.
Please advice.
# uname -a
Linux ip-172-30-0-118 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08) x86_64 GNU/Linux
# lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 8.5 (jessie)
Release: 8.5
Codename: jessie
# dpkg -l '*zfs*' | grep ii
ii libzfs2linux 0.6.5.7-1 amd64 OpenZFS filesystem library for Linux
ii zfs-dkms 0.6.5.7-1 all OpenZFS filesystem kernel modules for Linux
ii zfs-zed 0.6.5.7-1 amd64 OpenZFS Event Daemon
ii zfsutils-linux 0.6.5.7-1 amd64 command-line tools to manage OpenZFS filesystems
# dpkg -l '*spl*' | grep ii
ii spl 0.6.5.7-1 amd64 Solaris Porting Layer user-space utilities for Linux
ii spl-dkms 0.6.5.7-1 all Solaris Porting Layer kernel modules for Linux
# zpool status
pool: zfs-backup
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
zfs-backup ONLINE 0 0 0
xvdf ONLINE 0 0 0
logs
xvdg ONLINE 0 0 0
errors: No known data errors
# zpool iostat -v
capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
zfs-backup 564G 380G 538 0 1.48M 2.42K
xvdf 564G 380G 538 0 1.48M 2.11K
logs - - - - - -
xvdg 0 1.98G 0 0 471 312
---------- ----- ----- ----- ----- ----- -----
# zpool get all
NAME PROPERTY VALUE SOURCE
zfs-backup size 944G -
zfs-backup capacity 59% -
zfs-backup altroot - default
zfs-backup health ONLINE -
zfs-backup guid 2876612074418704500 default
zfs-backup version - default
zfs-backup bootfs - default
zfs-backup delegation on default
zfs-backup autoreplace off default
zfs-backup cachefile - default
zfs-backup failmode wait default
zfs-backup listsnapshots off default
zfs-backup autoexpand off default
zfs-backup dedupditto 0 default
zfs-backup dedupratio 1.00x -
zfs-backup free 380G -
zfs-backup allocated 564G -
zfs-backup readonly off -
zfs-backup ashift 0 default
zfs-backup comment - default
zfs-backup expandsize - -
zfs-backup freeing 0 default
zfs-backup fragmentation 44% -
zfs-backup leaked 0 default
zfs-backup feature@async_destroy enabled local
zfs-backup feature@empty_bpobj active local
zfs-backup feature@lz4_compress active local
zfs-backup feature@spacemap_histogram active local
zfs-backup feature@enabled_txg active local
zfs-backup feature@hole_birth active local
zfs-backup feature@extensible_dataset enabled local
zfs-backup feature@embedded_data active local
zfs-backup feature@bookmarks enabled local
zfs-backup feature@filesystem_limits enabled local
zfs-backup feature@large_blocks enabled local
Jun 15 16:55:40 ip-172-30-0-118 kernel: [113640.618167] txg_sync D ffff880079c259c8 0 1361 2 0x00000000
Jun 15 16:55:41 ip-172-30-0-118 kernel: [113640.621686] ffff880079c25570 0000000000000046 0000000000012f00 ffff88007aaf3fd8
Jun 15 16:55:43 ip-172-30-0-118 kernel: [113640.625572] 0000000000012f00 ffff880079c25570 ffff88007fc137b0 ffff8800070cf050
Jun 15 16:55:43 ip-172-30-0-118 kernel: [113640.629691] ffff8800070cf090 0000000000000001 ffff8800790a1000 0000000000000000
Jun 15 16:55:44 ip-172-30-0-118 kernel: [113640.633524] Call Trace:
Jun 15 16:55:45 ip-172-30-0-118 kernel: [113640.634740] [<ffffffff815114a9>] ? io_schedule+0x99/0x120
Jun 15 16:55:47 ip-172-30-0-118 kernel: [113640.637492] [<ffffffffa0152572>] ? cv_wait_common+0x92/0x110 [spl]
Jun 15 16:55:48 ip-172-30-0-118 kernel: [113640.640409] [<ffffffff810a7e60>] ? prepare_to_wait_event+0xf0/0xf0
Jun 15 16:55:49 ip-172-30-0-118 kernel: [113640.643375] [<ffffffffa029d12b>] ? zio_wait+0x10b/0x1e0 [zfs]
Jun 15 16:55:49 ip-172-30-0-118 kernel: [113640.646118] [<ffffffffa022a77a>] ? dsl_pool_sync+0xaa/0x460 [zfs]
Jun 15 16:55:50 ip-172-30-0-118 kernel: [113640.649130] [<ffffffffa0244766>] ? spa_sync+0x366/0xb30 [zfs]
Jun 15 16:55:51 ip-172-30-0-118 kernel: [113640.651910] [<ffffffffa0256231>] ? txg_sync_thread+0x3d1/0x680 [zfs]
Jun 15 16:55:53 ip-172-30-0-118 kernel: [113640.655111] [<ffffffffa0255e60>] ? txg_quiesce_thread+0x3e0/0x3e0 [zfs]
Jun 15 16:55:53 ip-172-30-0-118 kernel: [113640.658303] [<ffffffffa014dcab>] ? thread_generic_wrapper+0x6b/0x80 [spl]
Jun 15 16:55:54 ip-172-30-0-118 kernel: [113640.661609] [<ffffffffa014dc40>] ? __thread_exit+0x20/0x20 [spl]
Jun 15 16:55:55 ip-172-30-0-118 kernel: [113640.664528] [<ffffffff8108809d>] ? kthread+0xbd/0xe0
Jun 15 16:55:56 ip-172-30-0-118 kernel: [113640.667024] [<ffffffff81087fe0>] ? kthread_create_on_node+0x180/0x180
Jun 15 16:55:58 ip-172-30-0-118 kernel: [113640.670190] [<ffffffff81514958>] ? ret_from_fork+0x58/0x90
Jun 15 16:55:58 ip-172-30-0-118 kernel: [113640.672778] [<ffffffff81087fe0>] ? kthread_create_on_node+0x180/0x180
Jun 15 16:57:39 ip-172-30-0-118 kernel: [113760.681975] txg_sync D ffff880079c259c8 0 1361 2 0x00000000
Jun 15 16:57:40 ip-172-30-0-118 kernel: [113760.685521] ffff880079c25570 0000000000000046 0000000000012f00 ffff88007aaf3fd8
Jun 15 16:57:41 ip-172-30-0-118 kernel: [113760.689295] 0000000000012f00 ffff880079c25570 ffff88007fc137b0 ffff8800070cf050
Jun 15 16:57:42 ip-172-30-0-118 kernel: [113760.693211] ffff8800070cf090 0000000000000001 ffff8800790a1000 0000000000000000
Jun 15 16:57:43 ip-172-30-0-118 kernel: [113760.697110] Call Trace:
Jun 15 16:57:44 ip-172-30-0-118 kernel: [113760.698425] [<ffffffff815114a9>] ? io_schedule+0x99/0x120
Jun 15 16:57:45 ip-172-30-0-118 kernel: [113760.701106] [<ffffffffa0152572>] ? cv_wait_common+0x92/0x110 [spl]
Jun 15 16:57:47 ip-172-30-0-118 kernel: [113760.704131] [<ffffffff810a7e60>] ? prepare_to_wait_event+0xf0/0xf0
Jun 15 16:57:48 ip-172-30-0-118 kernel: [113760.707157] [<ffffffffa029d12b>] ? zio_wait+0x10b/0x1e0 [zfs]
Jun 15 16:57:49 ip-172-30-0-118 kernel: [113760.709944] [<ffffffffa022a77a>] ? dsl_pool_sync+0xaa/0x460 [zfs]
Jun 15 16:57:51 ip-172-30-0-118 kernel: [113760.713084] [<ffffffffa0244766>] ? spa_sync+0x366/0xb30 [zfs]
Jun 15 16:57:52 ip-172-30-0-118 kernel: [113760.715920] [<ffffffffa0256231>] ? txg_sync_thread+0x3d1/0x680 [zfs]
Jun 15 16:57:54 ip-172-30-0-118 kernel: [113760.719014] [<ffffffffa0255e60>] ? txg_quiesce_thread+0x3e0/0x3e0 [zfs]
Jun 15 16:57:55 ip-172-30-0-118 kernel: [113760.722283] [<ffffffffa014dcab>] ? thread_generic_wrapper+0x6b/0x80 [spl]
Jun 15 16:57:57 ip-172-30-0-118 kernel: [113760.725571] [<ffffffffa014dc40>] ? __thread_exit+0x20/0x20 [spl]
Jun 15 16:57:57 ip-172-30-0-118 kernel: [113760.728485] [<ffffffff8108809d>] ? kthread+0xbd/0xe0
Jun 15 16:57:58 ip-172-30-0-118 kernel: [113760.730974] [<ffffffff81087fe0>] ? kthread_create_on_node+0x180/0x180
Jun 15 16:58:00 ip-172-30-0-118 kernel: [113760.734102] [<ffffffff81514958>] ? ret_from_fork+0x58/0x90
Jun 15 16:58:02 ip-172-30-0-118 kernel: [113760.736819] [<ffffffff81087fe0>] ? kthread_create_on_node+0x180/0x180
Jun 15 16:59:38 ip-172-30-0-118 kernel: [113880.747394] txg_sync D ffff880079c259c8 0 1361 2 0x00000000
Jun 15 16:59:39 ip-172-30-0-118 kernel: [113880.750796] ffff880079c25570 0000000000000046 0000000000012f00 ffff88007aaf3fd8
Jun 15 16:59:39 ip-172-30-0-118 kernel: [113880.754658] 0000000000012f00 ffff880079c25570 ffff88007fc137b0 ffff8800070cf050
Jun 15 16:59:41 ip-172-30-0-118 kernel: [113880.758412] ffff8800070cf090 0000000000000001 ffff8800790a1000 0000000000000000
Jun 15 16:59:42 ip-172-30-0-118 kernel: [113880.762234] Call Trace:
Jun 15 16:59:43 ip-172-30-0-118 kernel: [113880.763454] [<ffffffff815114a9>] ? io_schedule+0x99/0x120
Jun 15 16:59:44 ip-172-30-0-118 kernel: [113880.766117] [<ffffffffa0152572>] ? cv_wait_common+0x92/0x110 [spl]
Jun 15 16:59:45 ip-172-30-0-118 kernel: [113880.769024] [<ffffffff810a7e60>] ? prepare_to_wait_event+0xf0/0xf0
Jun 15 16:59:45 ip-172-30-0-118 kernel: [113880.772106] [<ffffffffa029d12b>] ? zio_wait+0x10b/0x1e0 [zfs]
Jun 15 16:59:46 ip-172-30-0-118 kernel: [113880.775310] [<ffffffffa022a77a>] ? dsl_pool_sync+0xaa/0x460 [zfs]
Jun 15 16:59:48 ip-172-30-0-118 kernel: [113880.778331] [<ffffffffa0244766>] ? spa_sync+0x366/0xb30 [zfs]
Jun 15 16:59:50 ip-172-30-0-118 kernel: [113880.781198] [<ffffffffa0256231>] ? txg_sync_thread+0x3d1/0x680 [zfs]
Jun 15 16:59:51 ip-172-30-0-118 kernel: [113880.784324] [<ffffffffa0255e60>] ? txg_quiesce_thread+0x3e0/0x3e0 [zfs]
Jun 15 16:59:52 ip-172-30-0-118 kernel: [113880.787500] [<ffffffffa014dcab>] ? thread_generic_wrapper+0x6b/0x80 [spl]
Jun 15 16:59:54 ip-172-30-0-118 kernel: [113880.790757] [<ffffffffa014dc40>] ? __thread_exit+0x20/0x20 [spl]
Jun 15 16:59:55 ip-172-30-0-118 kernel: [113880.793707] [<ffffffff8108809d>] ? kthread+0xbd/0xe0
Jun 15 16:59:56 ip-172-30-0-118 kernel: [113880.796081] [<ffffffff81087fe0>] ? kthread_create_on_node+0x180/0x180
Jun 15 16:59:57 ip-172-30-0-118 kernel: [113880.799104] [<ffffffff81514958>] ? ret_from_fork+0x58/0x90
Jun 15 16:59:58 ip-172-30-0-118 kernel: [113880.801724] [<ffffffff81087fe0>] ? kthread_create_on_node+0x180/0x180
Jun 15 17:01:41 ip-172-30-0-118 kernel: [114000.817906] txg_sync D ffff880079c259c8 0 1361 2 0x00000000
Jun 15 17:01:42 ip-172-30-0-118 kernel: [114000.822256] ffff880079c25570 0000000000000046 0000000000012f00 ffff88007aaf3fd8
Jun 15 17:01:43 ip-172-30-0-118 kernel: [114000.827952] 0000000000012f00 ffff880079c25570 ffff88007fc137b0 ffff8800070cf050
Jun 15 17:01:45 ip-172-30-0-118 kernel: [114000.833092] ffff8800070cf090 0000000000000001 ffff8800790a1000 0000000000000000
Jun 15 17:01:45 ip-172-30-0-118 kernel: [114000.837533] Call Trace:
Jun 15 17:01:47 ip-172-30-0-118 kernel: [114000.838994] [<ffffffff815114a9>] ? io_schedule+0x99/0x120
Jun 15 17:01:48 ip-172-30-0-118 kernel: [114000.842069] [<ffffffffa0152572>] ? cv_wait_common+0x92/0x110 [spl]
Jun 15 17:01:49 ip-172-30-0-118 kernel: [114000.845833] [<ffffffff810a7e60>] ? prepare_to_wait_event+0xf0/0xf0
Jun 15 17:01:50 ip-172-30-0-118 kernel: [114000.849362] [<ffffffffa029d12b>] ? zio_wait+0x10b/0x1e0 [zfs]
Jun 15 17:01:51 ip-172-30-0-118 kernel: [114000.852630] [<ffffffffa022a77a>] ? dsl_pool_sync+0xaa/0x460 [zfs]
Jun 15 17:01:52 ip-172-30-0-118 kernel: [114000.856037] [<ffffffffa0244766>] ? spa_sync+0x366/0xb30 [zfs]
Jun 15 17:01:53 ip-172-30-0-118 kernel: [114000.859185] [<ffffffffa0256231>] ? txg_sync_thread+0x3d1/0x680 [zfs]
Jun 15 17:01:54 ip-172-30-0-118 kernel: [114000.862633] [<ffffffffa0255e60>] ? txg_quiesce_thread+0x3e0/0x3e0 [zfs]
Jun 15 17:01:55 ip-172-30-0-118 kernel: [114000.866301] [<ffffffffa014dcab>] ? thread_generic_wrapper+0x6b/0x80 [spl]
Jun 15 17:01:56 ip-172-30-0-118 kernel: [114000.870041] [<ffffffffa014dc40>] ? __thread_exit+0x20/0x20 [spl]
Jun 15 17:01:57 ip-172-30-0-118 kernel: [114000.873530] [<ffffffff8108809d>] ? kthread+0xbd/0xe0
Jun 15 17:01:57 ip-172-30-0-118 kernel: [114000.876425] [<ffffffff81087fe0>] ? kthread_create_on_node+0x180/0x180
Jun 15 17:01:59 ip-172-30-0-118 kernel: [114000.879915] [<ffffffff81514958>] ? ret_from_fork+0x58/0x90
Jun 15 17:02:00 ip-172-30-0-118 kernel: [114000.882644] [<ffffffff81087fe0>] ? kthread_create_on_node+0x180/0x180
@narunask could you please post output of
cat /proc/spl/kstat/zfs/arcstats
or meanwhile access to the box is denied ("bricked") ?
Also please post some hardware and specific configuration data (RAM, processor, mainboard, harddrive type, lvm/cryptsetup/etc. etc.)
thanks
Server is on the EC2
, currently 1 core, 2GB RAM (t2.small
), HDD - the magnetic one, I believe it is documented here.
HDD that I'm copying from is LVM based, consisting of 3 PVs. HDD are not encrypted.
Also FYI, currently rsync
is calculating deltas and should start moving data again very soon, if that adds something to the stats.
# cat /proc/spl/kstat/zfs/arcstats
6 1 0x01 91 4368 35041278654 31678952268315
name type data
hits 4 42347862
misses 4 10285010
demand_data_hits 4 0
demand_data_misses 4 68
demand_metadata_hits 4 34887103
demand_metadata_misses 4 6345542
prefetch_data_hits 4 0
prefetch_data_misses 4 0
prefetch_metadata_hits 4 7460759
prefetch_metadata_misses 4 3939400
mru_hits 4 20830447
mru_ghost_hits 4 2991105
mfu_hits 4 14056656
mfu_ghost_hits 4 1782606
deleted 4 2167955
mutex_miss 4 90369
evict_skip 4 1781974161
evict_not_enough 4 15390660
evict_l2_cached 4 0
evict_l2_eligible 4 35359622656
evict_l2_ineligible 4 55754657792
evict_l2_skip 4 0
hash_elements 4 5452
hash_elements_max 4 48255
hash_collisions 4 57676
hash_chains 4 69
hash_chain_max 4 4
p 4 7007439
c 4 34933248
c_min 4 33554432
c_max 4 1053282304
size 4 105425184
hdr_size 4 1969640
data_size 4 0
metadata_size 4 30842880
other_size 4 72612664
anon_size 4 753664
anon_evictable_data 4 0
anon_evictable_metadata 4 0
mru_size 4 27006976
mru_evictable_data 4 0
mru_evictable_metadata 4 3653632
mru_ghost_size 4 6850560
mru_ghost_evictable_data 4 0
mru_ghost_evictable_metadata 4 6850560
mfu_size 4 3082240
mfu_evictable_data 4 0
mfu_evictable_metadata 4 32768
mfu_ghost_size 4 25972224
mfu_ghost_evictable_data 4 0
mfu_ghost_evictable_metadata 4 25972224
l2_hits 4 0
l2_misses 4 0
l2_feeds 4 0
l2_rw_clash 4 0
l2_read_bytes 4 0
l2_write_bytes 4 0
l2_writes_sent 4 0
l2_writes_done 4 0
l2_writes_error 4 0
l2_writes_lock_retry 4 0
l2_evict_lock_retry 4 0
l2_evict_reading 4 0
l2_evict_l1cached 4 0
l2_free_on_write 4 0
l2_abort_lowmem 4 0
l2_cksum_bad 4 0
l2_io_error 4 0
l2_size 4 0
l2_asize 4 0
l2_hdr_size 4 0
l2_compress_successes 4 0
l2_compress_zeros 4 0
l2_compress_failures 4 0
memory_throttle_count 4 0
duplicate_buffers 4 0
duplicate_buffers_size 4 0
duplicate_reads 4 0
memory_direct_count 4 947
memory_indirect_count 4 259028
arc_no_grow 4 0
arc_tempreserve 4 0
arc_loaned_bytes 4 0
arc_prune 4 3960
arc_meta_used 4 105425184
arc_meta_limit 4 789961728
arc_meta_max 4 816526072
arc_meta_min 4 16777216
arc_need_free 4 0
arc_sys_free 4 32911360
# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 63
model name : Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz
stepping : 2
microcode : 0x25
cpu MHz : 2394.530
cache size : 30720 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm xsaveopt fsgsbase bmi1 avx2 smep bmi2 erms invpcid
bogomips : 4789.06
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
# dmidecode --type memory
# dmidecode 3.0
Scanning /dev/mem for entry point.
SMBIOS 2.4 present.
Handle 0x1000, DMI type 16, 15 bytes
Physical Memory Array
Location: Other
Use: System Memory
Error Correction Type: Multi-bit ECC
Maximum Capacity: 2 GB
Error Information Handle: Not Provided
Number Of Devices: 1
Handle 0x1100, DMI type 17, 21 bytes
Memory Device
Array Handle: 0x1000
Error Information Handle: 0x0000
Total Width: 64 bits
Data Width: 64 bits
Size: 2048 MB
Form Factor: DIMM
Set: None
Locator: DIMM 0
Bank Locator: Not Specified
Type: RAM
Type Detail: None
Thanks
I haven't seen the problem for quite a while now (seemed to have gone since 0.6.5.6), but this morning I had the very same hung task with 0.6.5.7 (single occurrance) come up again. Sigh...
I can confirm that I have not seen this issue happen for ages on the same hardware. I have not had it reoccur, probably within the last 3-6 months.
Still happening with heavy RSync backups.
[88234.807775] INFO: task rsync:18699 blocked for more than 120 seconds.
[88234.807929] Tainted: P O 4.8.13-1-ARCH #1
[88234.808026] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[88234.808159] rsync D ffff880124bcb9d0 0 18699 18698 0x00000000
[88234.808171] ffff880124bcb9d0 00ffffff81003016 ffff88021cd00d00 ffff880127fa6800
[88234.808179] ffff88022fc17ff0 ffff880124bcc000 ffff88021d8eaa10 ffff88021d8eab58
[88234.808185] ffff88021d8eaa38 0000000000000000 ffff880124bcb9e8 ffffffff815f40ec
[88234.808190] Call Trace:
[88234.808203] [<ffffffff815f40ec>] schedule+0x3c/0x90
[88234.808217] [<ffffffffa0171e1f>] cv_wait_common+0x10f/0x130 [spl]
[88234.808225] [<ffffffff810c0450>] ? wake_atomic_t_function+0x60/0x60
[88234.808235] [<ffffffffa0171e55>] __cv_wait+0x15/0x20 [spl]
[88234.808294] [<ffffffffa03095f8>] txg_wait_open+0xa8/0xe0 [zfs]
[88234.808338] [<ffffffffa02c73db>] dmu_tx_wait+0x32b/0x340 [zfs]
[88234.808380] [<ffffffffa02c747b>] dmu_tx_assign+0x8b/0x490 [zfs]
[88234.808422] [<ffffffffa0342d09>] zfs_write+0x3f9/0xc80 [zfs]
[88234.808461] [<ffffffffa034f567>] ? zio_destroy+0xb7/0xc0 [zfs]
[88234.808500] [<ffffffffa0352c78>] ? zio_wait+0x138/0x1d0 [zfs]
[88234.808507] [<ffffffffa016ab9a>] ? spl_kmem_free+0x2a/0x40 [spl]
[88234.808551] [<ffffffffa0337510>] ? zfs_range_unlock+0x1a0/0x2c0 [zfs]
[88234.808590] [<ffffffffa035826c>] zpl_write_common_iovec+0x8c/0xe0 [zfs]
[88234.808627] [<ffffffffa0358497>] zpl_write+0x87/0xc0 [zfs]
[88234.808635] [<ffffffff81208797>] __vfs_write+0x37/0x140
[88234.808642] [<ffffffff810c7be7>] ? percpu_down_read+0x17/0x50
[88234.808648] [<ffffffff81209566>] vfs_write+0xb6/0x1a0
[88234.808652] [<ffffffff8120a9e5>] SyS_write+0x55/0xc0
[88234.808658] [<ffffffff815f8032>] entry_SYSCALL_64_fastpath+0x1a/0xa4
[88234.808664] INFO: task imap:19376 blocked for more than 120 seconds.
[88234.808803] Tainted: P O 4.8.13-1-ARCH #1
[88234.808898] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[88234.809029] imap D ffff8801b875fad8 0 19376 1488 0x00000000
[88234.809036] ffff8801b875fad8 00ff88021d8eab00 ffffffff81a0d500 ffff8801a9f95b00
[88234.809042] ffff8801b875fab8 ffff8801b8760000 ffff88021d8eaa10 ffff88021d8eab58
[88234.809048] ffff88021d8eaa38 0000000000000000 ffff8801b875faf0 ffffffff815f40ec
[88234.809053] Call Trace:
[88234.809059] [<ffffffff815f40ec>] schedule+0x3c/0x90
[88234.809067] [<ffffffffa0171e1f>] cv_wait_common+0x10f/0x130 [spl]
[88234.809073] [<ffffffff810c0450>] ? wake_atomic_t_function+0x60/0x60
[88234.809081] [<ffffffffa0171e55>] __cv_wait+0x15/0x20 [spl]
[88234.809131] [<ffffffffa03095f8>] txg_wait_open+0xa8/0xe0 [zfs]
[88234.809175] [<ffffffffa02c73db>] dmu_tx_wait+0x32b/0x340 [zfs]
[88234.809217] [<ffffffffa02c747b>] dmu_tx_assign+0x8b/0x490 [zfs]
[88234.809259] [<ffffffffa0342299>] zfs_dirty_inode+0xe9/0x300 [zfs]
[88234.809265] [<ffffffff8122a3b4>] ? mntput+0x24/0x40
[88234.809272] [<ffffffff81189adb>] ? release_pages+0x2cb/0x380
[88234.809278] [<ffffffff811c763e>] ? free_pages_and_swap_cache+0x8e/0xa0
[88234.809315] [<ffffffffa035a2cc>] zpl_dirty_inode+0x2c/0x40 [zfs]
[88234.809322] [<ffffffff81237785>] __mark_inode_dirty+0x45/0x400
[88234.809362] [<ffffffffa0346fe8>] zfs_mark_inode_dirty+0x48/0x50 [zfs]
[88234.809399] [<ffffffffa0358826>] zpl_release+0x46/0x90 [zfs]
[88234.809404] [<ffffffff8120b42f>] __fput+0x9f/0x1e0
[88234.809408] [<ffffffff8120b5ae>] ____fput+0xe/0x10
[88234.809414] [<ffffffff8109a0d0>] task_work_run+0x80/0xa0
[88234.809420] [<ffffffff8100366a>] exit_to_usermode_loop+0xba/0xc0
[88234.809425] [<ffffffff81003b2e>] syscall_return_slowpath+0x4e/0x60
[88234.809430] [<ffffffff815f80ba>] entry_SYSCALL_64_fastpath+0xa2/0xa4
Most of the time performance is fine.
So, I replaced the drive which might have been causing issues, with a new Samsung 840 PRO SSD, partitioned 50GB OS, 4GB Swap and the rest available as L2ARC.
I've set the arc max size to 4GB on a system with only 8GB memory.
Tonight, the same issue occurred, rsync and then everything ground to a halt. So, it seems unlikely to be a disk issue. Nominal read speeds are 100MB/s+ so things are humming along quite nicely.
I can reproduce this issue fairly easily so just let me know what information you'd like me to collect and I'll try to do it.
[78889.098553] INFO: task txg_sync:1372 blocked for more than 120 seconds.
[78889.098667] Tainted: P O 4.8.13-1-ARCH #1
[78889.098719] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[78889.098790] txg_sync D ffff88021c893ab8 0 1372 2 0x00000000
[78889.098798] ffff88021c893ab8 00ffffff810a5d2f ffff8802212b5b00 ffff88021d3ece00
[78889.098802] 0000000000000046 ffff88021c894000 ffff88022fc17f80 7fffffffffffffff
[78889.098806] ffff880105123b60 0000000000000001 ffff88021c893ad0 ffffffff815f40ec
[78889.098809] Call Trace:
[78889.098818] [<ffffffff815f40ec>] schedule+0x3c/0x90
[78889.098821] [<ffffffff815f6fa3>] schedule_timeout+0x243/0x3d0
[78889.098825] [<ffffffff810a6c12>] ? default_wake_function+0x12/0x20
[78889.098828] [<ffffffff810bfd9d>] ? __wake_up_common+0x4d/0x80
[78889.098832] [<ffffffff810f2c51>] ? ktime_get+0x41/0xb0
[78889.098835] [<ffffffff815f38c4>] io_schedule_timeout+0xa4/0x110
[78889.098843] [<ffffffffa0203dc1>] cv_wait_common+0xb1/0x130 [spl]
[78889.098846] [<ffffffff810c0450>] ? wake_atomic_t_function+0x60/0x60
[78889.098851] [<ffffffffa0203e98>] __cv_wait_io+0x18/0x20 [spl]
[78889.098886] [<ffffffffa0343c3d>] zio_wait+0xfd/0x1d0 [zfs]
[78889.098912] [<ffffffffa02cf258>] dsl_pool_sync+0xb8/0x480 [zfs]
[78889.098939] [<ffffffffa02ea38f>] spa_sync+0x37f/0xb30 [zfs]
[78889.098942] [<ffffffff810a6c12>] ? default_wake_function+0x12/0x20
[78889.098969] [<ffffffffa02fac1a>] txg_sync_thread+0x3ba/0x620 [zfs]
[78889.098997] [<ffffffffa02fa860>] ? txg_delay+0x160/0x160 [zfs]
[78889.099002] [<ffffffffa01fef61>] thread_generic_wrapper+0x71/0x80 [spl]
[78889.099006] [<ffffffffa01feef0>] ? __thread_exit+0x20/0x20 [spl]
[78889.099010] [<ffffffff8109be38>] kthread+0xd8/0xf0
[78889.099013] [<ffffffff8102c782>] ? __switch_to+0x2d2/0x630
[78889.099016] [<ffffffff815f823f>] ret_from_fork+0x1f/0x40
[78889.099019] [<ffffffff8109bd60>] ? kthread_worker_fn+0x170/0x170
[79380.619090] INFO: task txg_sync:1372 blocked for more than 120 seconds.
[79380.619245] Tainted: P O 4.8.13-1-ARCH #1
[79380.619342] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[79380.619475] txg_sync D ffff88021c893ab8 0 1372 2 0x00000000
[79380.619487] ffff88021c893ab8 00ffffff810a5d2f ffffffff81a0d500 ffff88021d3ece00
[79380.619494] 0000000000000046 ffff88021c894000 ffff88022fc17f80 7fffffffffffffff
[79380.619500] ffff8801e9c0c850 0000000000000001 ffff88021c893ad0 ffffffff815f40ec
[79380.619506] Call Trace:
[79380.619519] [<ffffffff815f40ec>] schedule+0x3c/0x90
[79380.619525] [<ffffffff815f6fa3>] schedule_timeout+0x243/0x3d0
[79380.619531] [<ffffffff810a6c12>] ? default_wake_function+0x12/0x20
[79380.619537] [<ffffffff810bfd9d>] ? __wake_up_common+0x4d/0x80
[79380.619543] [<ffffffff810f2c51>] ? ktime_get+0x41/0xb0
[79380.619547] [<ffffffff815f38c4>] io_schedule_timeout+0xa4/0x110
[79380.619560] [<ffffffffa0203dc1>] cv_wait_common+0xb1/0x130 [spl]
[79380.619565] [<ffffffff810c0450>] ? wake_atomic_t_function+0x60/0x60
[79380.619574] [<ffffffffa0203e98>] __cv_wait_io+0x18/0x20 [spl]
[79380.619623] [<ffffffffa0343c3d>] zio_wait+0xfd/0x1d0 [zfs]
[79380.619672] [<ffffffffa02cf258>] dsl_pool_sync+0xb8/0x480 [zfs]
[79380.619724] [<ffffffffa02ea38f>] spa_sync+0x37f/0xb30 [zfs]
[79380.619729] [<ffffffff810a6c12>] ? default_wake_function+0x12/0x20
[79380.619780] [<ffffffffa02fac1a>] txg_sync_thread+0x3ba/0x620 [zfs]
[79380.619830] [<ffffffffa02fa860>] ? txg_delay+0x160/0x160 [zfs]
[79380.619839] [<ffffffffa01fef61>] thread_generic_wrapper+0x71/0x80 [spl]
[79380.619847] [<ffffffffa01feef0>] ? __thread_exit+0x20/0x20 [spl]
[79380.619854] [<ffffffff8109be38>] kthread+0xd8/0xf0
[79380.619860] [<ffffffff8102c782>] ? __switch_to+0x2d2/0x630
[79380.619866] [<ffffffff815f823f>] ret_from_fork+0x1f/0x40
[79380.619872] [<ffffffff8109bd60>] ? kthread_worker_fn+0x170/0x170
[81223.826929] INFO: task txg_sync:1372 blocked for more than 120 seconds.
[81223.827091] Tainted: P O 4.8.13-1-ARCH #1
[81223.827226] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[81223.827381] txg_sync D ffff88021c893ab8 0 1372 2 0x00000000
[81223.827393] ffff88021c893ab8 00ffffff810a5d2f ffff8802157c3400 ffff88021d3ece00
[81223.827400] 0000000000000046 ffff88021c894000 ffff88022fc17f80 7fffffffffffffff
[81223.827407] ffff88015f37d0f0 0000000000000001 ffff88021c893ad0 ffffffff815f40ec
[81223.827413] Call Trace:
[81223.827426] [<ffffffff815f40ec>] schedule+0x3c/0x90
[81223.827432] [<ffffffff815f6fa3>] schedule_timeout+0x243/0x3d0
[81223.827439] [<ffffffff810a6c12>] ? default_wake_function+0x12/0x20
[81223.827444] [<ffffffff810bfd9d>] ? __wake_up_common+0x4d/0x80
[81223.827451] [<ffffffff810f2c51>] ? ktime_get+0x41/0xb0
[81223.827455] [<ffffffff815f38c4>] io_schedule_timeout+0xa4/0x110
[81223.827468] [<ffffffffa0203dc1>] cv_wait_common+0xb1/0x130 [spl]
[81223.827473] [<ffffffff810c0450>] ? wake_atomic_t_function+0x60/0x60
[81223.827482] [<ffffffffa0203e98>] __cv_wait_io+0x18/0x20 [spl]
[81223.827531] [<ffffffffa0343c3d>] zio_wait+0xfd/0x1d0 [zfs]
[81223.827580] [<ffffffffa02cf258>] dsl_pool_sync+0xb8/0x480 [zfs]
[81223.827632] [<ffffffffa02ea38f>] spa_sync+0x37f/0xb30 [zfs]
[81223.827636] [<ffffffff810a6c12>] ? default_wake_function+0x12/0x20
[81223.827687] [<ffffffffa02fac1a>] txg_sync_thread+0x3ba/0x620 [zfs]
[81223.827738] [<ffffffffa02fa860>] ? txg_delay+0x160/0x160 [zfs]
[81223.827747] [<ffffffffa01fef61>] thread_generic_wrapper+0x71/0x80 [spl]
[81223.827755] [<ffffffffa01feef0>] ? __thread_exit+0x20/0x20 [spl]
[81223.827762] [<ffffffff8109be38>] kthread+0xd8/0xf0
[81223.827767] [<ffffffff8102c782>] ? __switch_to+0x2d2/0x630
[81223.827773] [<ffffffff815f823f>] ret_from_fork+0x1f/0x40
[81223.827779] [<ffffffff8109bd60>] ? kthread_worker_fn+0x170/0x170
[94923.528386] CE: hpet increased min_delta_ns to 20115 nsec
[102727.918958] INFO: task rsync:5577 blocked for more than 120 seconds.
[102727.919092] Tainted: P O 4.8.13-1-ARCH #1
[102727.919170] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[102727.919277] rsync D ffff8801dfe9ba10 0 5577 5576 0x00000000
[102727.919287] ffff8801dfe9ba10 00000000ffffffff ffffffff81a0d500 ffff8802157c3400
[102727.919293] ffffffffa0340567 ffff8801dfe9c000 ffff88021ed41148 ffff88021ed41180
[102727.919298] ffff88021ed41170 0000000000000000 ffff8801dfe9ba28 ffffffff815f40ec
[102727.919303] Call Trace:
[102727.919352] [<ffffffffa0340567>] ? zio_destroy+0xb7/0xc0 [zfs]
[102727.919358] [<ffffffff815f40ec>] schedule+0x3c/0x90
[102727.919367] [<ffffffffa0203e1f>] cv_wait_common+0x10f/0x130 [spl]
[102727.919373] [<ffffffff810c0450>] ? wake_atomic_t_function+0x60/0x60
[102727.919380] [<ffffffffa0203e55>] __cv_wait+0x15/0x20 [spl]
[102727.919414] [<ffffffffa02b814b>] dmu_tx_wait+0x9b/0x340 [zfs]
[102727.919448] [<ffffffffa02b847b>] dmu_tx_assign+0x8b/0x490 [zfs]
[102727.919482] [<ffffffffa0333d09>] zfs_write+0x3f9/0xc80 [zfs]
[102727.919514] [<ffffffffa0340567>] ? zio_destroy+0xb7/0xc0 [zfs]
[102727.919544] [<ffffffffa0343c78>] ? zio_wait+0x138/0x1d0 [zfs]
[102727.919579] [<ffffffffa0328510>] ? zfs_range_unlock+0x1a0/0x2c0 [zfs]
[102727.919610] [<ffffffffa034926c>] zpl_write_common_iovec+0x8c/0xe0 [zfs]
[102727.919640] [<ffffffffa0349497>] zpl_write+0x87/0xc0 [zfs]
[102727.919646] [<ffffffff81208797>] __vfs_write+0x37/0x140
[102727.919652] [<ffffffff810c7be7>] ? percpu_down_read+0x17/0x50
[102727.919656] [<ffffffff81209566>] vfs_write+0xb6/0x1a0
[102727.919660] [<ffffffff8120a9e5>] SyS_write+0x55/0xc0
[102727.919665] [<ffffffff815f8032>] entry_SYSCALL_64_fastpath+0x1a/0xa4
Happened last night during backups.
Okay, so I've replaced the drive which had the high await time, and also increased the memory to 16GB, still having issues:
[12785.566973] CE: hpet increased min_delta_ns to 20115 nsec
[25560.317294] INFO: task txg_sync:1392 blocked for more than 120 seconds.
[25560.317451] Tainted: P O 4.9.6-1-ARCH #1
[25560.317542] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[25560.317668] txg_sync D 0 1392 2 0x00000000
[25560.317678] ffff88041b855c00 0000000000000000 ffff8804104b2700 ffff88042fc180c0
[25560.317686] ffffffff81a0e500 ffffc9000828fad8 ffffffff81605cdf ffff880411a4c080
[25560.317692] 00ffffffa0341360 ffff88042fc180c0 0000000000000000 ffff8804104b2700
[25560.317698] Call Trace:
[25560.317713] [<ffffffff81605cdf>] ? __schedule+0x22f/0x6e0
[25560.317719] [<ffffffff816061cd>] schedule+0x3d/0x90
[25560.317726] [<ffffffff81608fd3>] schedule_timeout+0x243/0x3d0
[25560.317781] [<ffffffffa033fbb1>] ? zio_taskq_dispatch+0x91/0xa0 [zfs]
[25560.317821] [<ffffffffa033fbd2>] ? zio_issue_async+0x12/0x20 [zfs]
[25560.317859] [<ffffffffa0343569>] ? zio_nowait+0x79/0x110 [zfs]
[25560.317867] [<ffffffff810f7b81>] ? ktime_get+0x41/0xb0
[25560.317873] [<ffffffff81605a44>] io_schedule_timeout+0xa4/0x110
[25560.317884] [<ffffffffa01eccd1>] cv_wait_common+0xb1/0x130 [spl]
[25560.317891] [<ffffffff810c4200>] ? wake_atomic_t_function+0x60/0x60
[25560.317900] [<ffffffffa01ecda8>] __cv_wait_io+0x18/0x20 [spl]
[25560.317938] [<ffffffffa034337c>] zio_wait+0xac/0x130 [zfs]
[25560.317984] [<ffffffffa02cf408>] dsl_pool_sync+0xb8/0x480 [zfs]
[25560.318035] [<ffffffffa02e9e1f>] spa_sync+0x37f/0xb30 [zfs]
[25560.318041] [<ffffffff810aa4a2>] ? default_wake_function+0x12/0x20
[25560.318091] [<ffffffffa02fa6aa>] txg_sync_thread+0x3ba/0x620 [zfs]
[25560.318096] [<ffffffff8102d752>] ? __switch_to+0x2d2/0x630
[25560.318145] [<ffffffffa02fa2f0>] ? txg_delay+0x160/0x160 [zfs]
[25560.318154] [<ffffffffa01e7f22>] thread_generic_wrapper+0x72/0x80 [spl]
[25560.318161] [<ffffffffa01e7eb0>] ? __thread_exit+0x20/0x20 [spl]
[25560.318167] [<ffffffff8109e8f9>] kthread+0xd9/0xf0
[25560.318171] [<ffffffff8102d752>] ? __switch_to+0x2d2/0x630
[25560.318176] [<ffffffff8109e820>] ? kthread_park+0x60/0x60
[25560.318180] [<ffffffff8109e820>] ? kthread_park+0x60/0x60
[25560.318184] [<ffffffff8160a995>] ret_from_fork+0x25/0x30
[25683.204571] INFO: task txg_sync:1392 blocked for more than 120 seconds.
[25683.204722] Tainted: P O 4.9.6-1-ARCH #1
[25683.204814] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[25683.204940] txg_sync D 0 1392 2 0x00000000
[25683.204950] ffff88041b855c00 0000000000000000 ffff8804104b2700 ffff88042fc180c0
[25683.204958] ffffffff81a0e500 ffffc9000828fad8 ffffffff81605cdf ffff880411a4c080
[25683.204965] 00ffffffa0341360 ffff88042fc180c0 0000000000000000 ffff8804104b2700
[25683.204970] Call Trace:
[25683.204985] [<ffffffff81605cdf>] ? __schedule+0x22f/0x6e0
[25683.204992] [<ffffffff816061cd>] schedule+0x3d/0x90
[25683.204999] [<ffffffff81608fd3>] schedule_timeout+0x243/0x3d0
[25683.205055] [<ffffffffa033fbb1>] ? zio_taskq_dispatch+0x91/0xa0 [zfs]
[25683.205095] [<ffffffffa033fbd2>] ? zio_issue_async+0x12/0x20 [zfs]
[25683.205134] [<ffffffffa0343569>] ? zio_nowait+0x79/0x110 [zfs]
[25683.205142] [<ffffffff810f7b81>] ? ktime_get+0x41/0xb0
[25683.205148] [<ffffffff81605a44>] io_schedule_timeout+0xa4/0x110
[25683.205159] [<ffffffffa01eccd1>] cv_wait_common+0xb1/0x130 [spl]
[25683.205166] [<ffffffff810c4200>] ? wake_atomic_t_function+0x60/0x60
[25683.205175] [<ffffffffa01ecda8>] __cv_wait_io+0x18/0x20 [spl]
[25683.205213] [<ffffffffa034337c>] zio_wait+0xac/0x130 [zfs]
[25683.205259] [<ffffffffa02cf408>] dsl_pool_sync+0xb8/0x480 [zfs]
[25683.205310] [<ffffffffa02e9e1f>] spa_sync+0x37f/0xb30 [zfs]
[25683.205315] [<ffffffff810aa4a2>] ? default_wake_function+0x12/0x20
[25683.205365] [<ffffffffa02fa6aa>] txg_sync_thread+0x3ba/0x620 [zfs]
[25683.205371] [<ffffffff8102d752>] ? __switch_to+0x2d2/0x630
[25683.205419] [<ffffffffa02fa2f0>] ? txg_delay+0x160/0x160 [zfs]
[25683.205428] [<ffffffffa01e7f22>] thread_generic_wrapper+0x72/0x80 [spl]
[25683.205436] [<ffffffffa01e7eb0>] ? __thread_exit+0x20/0x20 [spl]
[25683.205441] [<ffffffff8109e8f9>] kthread+0xd9/0xf0
[25683.205445] [<ffffffff8102d752>] ? __switch_to+0x2d2/0x630
[25683.205450] [<ffffffff8109e820>] ? kthread_park+0x60/0x60
[25683.205454] [<ffffffff8109e820>] ? kthread_park+0x60/0x60
[25683.205458] [<ffffffff8160a995>] ret_from_fork+0x25/0x30
[26051.866268] INFO: task txg_sync:1392 blocked for more than 120 seconds.
[26051.866424] Tainted: P O 4.9.6-1-ARCH #1
[26051.866516] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[26051.866642] txg_sync D 0 1392 2 0x00000000
[26051.866652] ffff8803e6f2c000 0000000000000000 ffff8804104b2700 ffff88042fc180c0
[26051.866660] ffff88041b50b400 ffffc9000828fad8 ffffffff81605cdf ffff880411a4c080
[26051.866667] 00ffffffa0341360 ffff88042fc180c0 0000000000000000 ffff8804104b2700
[26051.866673] Call Trace:
[26051.866687] [<ffffffff81605cdf>] ? __schedule+0x22f/0x6e0
[26051.866694] [<ffffffff816061cd>] schedule+0x3d/0x90
[26051.866700] [<ffffffff81608fd3>] schedule_timeout+0x243/0x3d0
[26051.866756] [<ffffffffa033fbb1>] ? zio_taskq_dispatch+0x91/0xa0 [zfs]
[26051.866796] [<ffffffffa033fbd2>] ? zio_issue_async+0x12/0x20 [zfs]
[26051.866834] [<ffffffffa0343569>] ? zio_nowait+0x79/0x110 [zfs]
[26051.866842] [<ffffffff810f7b81>] ? ktime_get+0x41/0xb0
[26051.866847] [<ffffffff81605a44>] io_schedule_timeout+0xa4/0x110
[26051.866859] [<ffffffffa01eccd1>] cv_wait_common+0xb1/0x130 [spl]
[26051.866866] [<ffffffff810c4200>] ? wake_atomic_t_function+0x60/0x60
[26051.866874] [<ffffffffa01ecda8>] __cv_wait_io+0x18/0x20 [spl]
[26051.866913] [<ffffffffa034337c>] zio_wait+0xac/0x130 [zfs]
[26051.866959] [<ffffffffa02cf408>] dsl_pool_sync+0xb8/0x480 [zfs]
[26051.867009] [<ffffffffa02e9e1f>] spa_sync+0x37f/0xb30 [zfs]
[26051.867015] [<ffffffff810aa4a2>] ? default_wake_function+0x12/0x20
[26051.867065] [<ffffffffa02fa6aa>] txg_sync_thread+0x3ba/0x620 [zfs]
[26051.867070] [<ffffffff8102d752>] ? __switch_to+0x2d2/0x630
[26051.867119] [<ffffffffa02fa2f0>] ? txg_delay+0x160/0x160 [zfs]
[26051.867128] [<ffffffffa01e7f22>] thread_generic_wrapper+0x72/0x80 [spl]
[26051.867135] [<ffffffffa01e7eb0>] ? __thread_exit+0x20/0x20 [spl]
[26051.867140] [<ffffffff8109e8f9>] kthread+0xd9/0xf0
[26051.867145] [<ffffffff8102d752>] ? __switch_to+0x2d2/0x630
[26051.867149] [<ffffffff8109e820>] ? kthread_park+0x60/0x60
[26051.867153] [<ffffffff8109e820>] ? kthread_park+0x60/0x60
[26051.867157] [<ffffffff8160a995>] ret_from_fork+0x25/0x30
Running
% pacman -Q | egrep "zfs|spl"
spl-linux 0.6.5.9_4.9.6_1-2
spl-utils-linux 0.6.5.9_4.9.6_1-2
zfs-linux 0.6.5.9_4.9.6_1-2
zfs-utils-linux 0.6.5.9_4.9.6_1-2
Memory available is okay:
% free -h
total used free shared buff/cache available
Mem: 15G 12G 2.7G 692K 630M 3.0G
Swap: 4.0G 0B 4.0G
iostat seem okay (sde is samsung 840 pro)
% iostat -mx 10
Linux 4.9.6-1-ARCH (hinoki) 06/02/17 _x86_64_ (2 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0.69 0.00 2.26 3.18 0.00 93.88
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 4.39 13.96 0.19 0.23 47.25 0.06 3.16 11.42 0.57 2.28 4.18
sdb 0.00 0.00 4.39 13.69 0.19 0.22 46.63 0.06 3.13 11.19 0.55 2.26 4.09
sdc 0.00 0.00 4.29 13.58 0.19 0.23 47.97 0.06 3.61 12.91 0.66 2.58 4.62
sdd 0.00 0.00 4.32 13.80 0.18 0.23 46.40 0.06 3.18 11.60 0.55 2.29 4.15
sde 0.00 0.30 1.46 11.25 0.02 0.36 61.82 0.01 0.41 0.78 0.36 0.22 0.28
avg-cpu: %user %nice %system %iowait %steal %idle
10.43 0.00 47.68 28.80 0.00 13.08
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 54.20 0.80 3.27 0.01 122.01 0.31 5.70 5.77 0.88 3.10 17.06
sdb 0.00 0.00 54.60 0.80 3.09 0.01 114.60 0.43 7.70 7.80 0.75 3.62 20.07
sdc 0.00 0.00 54.20 1.00 3.15 0.01 117.16 0.39 6.98 7.11 0.00 3.80 21.00
sdd 0.00 0.00 56.30 1.00 3.08 0.01 110.27 0.43 7.57 7.70 0.30 3.87 22.17
sde 0.00 5.60 27.30 159.50 0.08 7.38 81.70 0.06 0.34 0.21 0.36 0.22 4.20
avg-cpu: %user %nice %system %iowait %steal %idle
5.69 0.00 43.65 30.28 0.00 20.38
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 43.30 0.90 2.45 0.01 113.81 0.35 7.86 8.03 0.00 4.38 19.34
sdb 0.00 0.00 40.30 0.90 2.31 0.01 114.99 0.36 8.68 8.87 0.44 4.47 18.40
sdc 0.00 0.00 41.80 0.80 2.39 0.01 115.08 0.34 8.08 8.22 0.38 4.68 19.93
sdd 0.00 0.00 41.20 0.80 2.30 0.01 112.34 0.44 10.39 10.58 0.88 4.56 19.16
sde 0.00 3.10 25.80 136.70 0.07 5.80 74.00 0.05 0.28 0.48 0.25 0.21 3.47
avg-cpu: %user %nice %system %iowait %steal %idle
3.42 0.00 38.83 28.72 0.00 29.02
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 26.70 0.90 1.58 0.01 117.86 0.26 9.37 9.63 1.78 5.51 15.20
sdb 0.00 0.00 28.50 0.90 1.53 0.01 107.54 0.25 8.59 8.86 0.00 5.04 14.83
sdc 0.00 0.00 27.40 0.90 1.41 0.01 103.12 0.27 9.34 9.65 0.00 5.49 15.53
sdd 0.00 0.00 28.60 0.90 1.43 0.01 100.18 0.24 8.21 8.47 0.00 4.84 14.27
sde 0.00 0.80 27.30 84.50 0.08 2.67 50.38 0.02 0.21 0.27 0.19 0.19 2.07
Just wanted to give a short update to my post in this issue from last year: in the mean time I have upgraded to ZFS 0.6.5.8 from Debian's backports, still using Debian 8. Unfortunately I still get the exact same timeout in the kernel logs.
@kpande if you're going to assert that, can you please describe what's big enough?
I have resources 4X the average described in this thread and I also see the same issues, in this case with the combination of zfs send and nfsd load.
severely undersized hardware
You may be right.
I have 4x 4TB drives, and one 256 SSD as OS/Cache. 16GB memory, 2 core CPU (1.5Ghz Atom). This system is almost exclusively used for RSync backups. I do feel that this is pretty reasonable for my needs.
The ARC cache is 12GB, the OS has 4GB left over. The L2 cache, if enabled , is about 170GB.
I'll have to check how many files it is, but I'm not sure if it's multi-millions or not.
@kpande you are right I forgot to give any relevant infos about my underlying hardware. Sorry about that, here are hopefully all the relevant infos:
My server is a virtualization Server running Xen 4.4 with currently 6 virtual machines which all have their logical volumes (LVM) stored on a RAIDZ1 volume with 3x 2TB Seagate SATA enterprise disks (ST32000645NS). The Debian 8 OS is independent and located on two internal SATA-SSD disks of both 16 GB in RAID1 using Linux MD for mirroring. The CPU is an Intel E5-2620 v3 @ 2.40GHz with 6 cores/12 threads. Out of these 6 cores 4 vCPUs have been pinned to the host/hypervisor/dom0 using the dom0_max_vcpus=4 dom0_vcpus_pin
Linux kernel options. The server has 64 GB of TruDDR4 ECC memory and out of this 64 GB of memory 6 GB has been reserved to the host/hypervisor/dom0 using the respective Linux kernel options dom0_mem=6G,max:6G
. Finally I have reserved 2 GB RAM of these 6 GB for the ARC using the following zfs_arc_max=2147483648
zfs module option. I have also disabled ZFS prefetch if that is of any relevance (zfs_prefetch_disable=1
).
Below is the output of an actual ARC summary (server has bee rebooted 5 days ago):
ZFS Subsystem Report Tue Feb 07 15:11:32 2017
ARC Summary: (HEALTHY)
Memory Throttle Count: 0
ARC Misc:
Deleted: 3.05m
Mutex Misses: 7
Evict Skips: 7
ARC Size: 77.69% 1.55 GiB
Target Size: (Adaptive) 100.00% 2.00 GiB
Min Size (Hard Limit): 1.56% 32.00 MiB
Max Size (High Water): 64:1 2.00 GiB
ARC Size Breakdown:
Recently Used Cache Size: 93.75% 1.88 GiB
Frequently Used Cache Size: 6.25% 128.00 MiB
ARC Hash Breakdown:
Elements Max: 171.83k
Elements Current: 97.97% 168.34k
Collisions: 68.60m
Chain Max: 6
Chains: 12.12k
ARC Total accesses: 102.96m
Cache Hit Ratio: 48.88% 50.32m
Cache Miss Ratio: 51.12% 52.63m
Actual Hit Ratio: 48.88% 50.32m
Data Demand Efficiency: 0.00% 49.34m
CACHE HITS BY CACHE LIST:
Most Recently Used: 63.44% 31.92m
Most Frequently Used: 36.56% 18.40m
Most Recently Used Ghost: 0.08% 42.37k
Most Frequently Used Ghost: 0.00% 0
CACHE HITS BY DATA TYPE:
Demand Data: 0.00% 167
Prefetch Data: 0.00% 0
Demand Metadata: 100.00% 50.32m
Prefetch Metadata: 0.00% 0
CACHE MISSES BY DATA TYPE:
Demand Data: 93.74% 49.34m
Prefetch Data: 0.00% 0
Demand Metadata: 6.26% 3.29m
Prefetch Metadata: 0.00% 145
ZFS Tunable:
metaslab_debug_load 0
zfs_arc_min_prefetch_lifespan 0
zfetch_max_streams 8
zfs_nopwrite_enabled 1
zfetch_min_sec_reap 2
zfs_dbgmsg_enable 0
zfs_dirty_data_max_max_percent 25
zfs_arc_p_aggressive_disable 1
spa_load_verify_data 1
zfs_zevent_cols 80
zfs_dirty_data_max_percent 10
zfs_sync_pass_dont_compress 5
l2arc_write_max 8388608
zfs_vdev_scrub_max_active 2
zfs_vdev_sync_write_min_active 10
zvol_prefetch_bytes 131072
metaslab_aliquot 524288
zfs_no_scrub_prefetch 0
zfs_arc_shrink_shift 0
zfetch_block_cap 256
zfs_txg_history 0
zfs_delay_scale 500000
zfs_vdev_async_write_active_min_dirty_percent 30
metaslab_debug_unload 0
zfs_read_history 0
zvol_max_discard_blocks 16384
zfs_recover 0
l2arc_headroom 2
zfs_deadman_synctime_ms 1000000
zfs_scan_idle 50
zfs_free_min_time_ms 1000
zfs_dirty_data_max 624856268
zfs_vdev_async_read_min_active 1
zfs_mg_noalloc_threshold 0
zfs_dedup_prefetch 0
zfs_vdev_max_active 1000
l2arc_write_boost 8388608
zfs_resilver_min_time_ms 3000
zfs_vdev_async_write_max_active 10
zil_slog_limit 1048576
zfs_prefetch_disable 1
zfs_resilver_delay 2
metaslab_lba_weighting_enabled 1
zfs_mg_fragmentation_threshold 85
l2arc_feed_again 1
zfs_zevent_console 0
zfs_immediate_write_sz 32768
zfs_dbgmsg_maxsize 4194304
zfs_free_leak_on_eio 0
zfs_deadman_enabled 1
metaslab_bias_enabled 1
zfs_arc_p_dampener_disable 1
zfs_object_mutex_size 64
zfs_metaslab_fragmentation_threshold 70
zfs_no_scrub_io 0
metaslabs_per_vdev 200
zfs_dbuf_state_index 0
zfs_vdev_sync_read_min_active 10
metaslab_fragmentation_factor_enabled 1
zvol_inhibit_dev 0
zfs_vdev_async_write_active_max_dirty_percent 60
zfs_vdev_cache_size 0
zfs_vdev_mirror_switch_us 10000
zfs_dirty_data_sync 67108864
spa_config_path /etc/zfs/zpool.cache
zfs_dirty_data_max_max 1562140672
zfs_arc_lotsfree_percent 10
zfs_zevent_len_max 64
zfs_scan_min_time_ms 1000
zfs_arc_sys_free 0
zfs_arc_meta_strategy 1
zfs_vdev_cache_bshift 16
zfs_arc_meta_adjust_restarts 4096
zfs_max_recordsize 1048576
zfs_vdev_scrub_min_active 1
zfs_vdev_read_gap_limit 32768
zfs_arc_meta_limit 0
zfs_vdev_sync_write_max_active 10
l2arc_norw 0
zfs_arc_meta_prune 10000
metaslab_preload_enabled 1
l2arc_nocompress 0
zvol_major 230
zfs_vdev_aggregation_limit 131072
zfs_flags 0
spa_asize_inflation 24
zfs_admin_snapshot 0
l2arc_feed_secs 1
zio_taskq_batch_pct 75
zfs_sync_pass_deferred_free 2
zfs_disable_dup_eviction 0
zfs_arc_grow_retry 0
zfs_read_history_hits 0
zfs_vdev_async_write_min_active 1
zfs_vdev_async_read_max_active 3
zfs_scrub_delay 4
zfs_delay_min_dirty_percent 60
zfs_free_max_blocks 100000
zfs_vdev_cache_max 16384
zio_delay_max 30000
zfs_top_maxinflight 32
ignore_hole_birth 1
spa_slop_shift 5
zfs_vdev_write_gap_limit 4096
spa_load_verify_metadata 1
spa_load_verify_maxinflight 10000
l2arc_noprefetch 1
zfs_vdev_scheduler noop
zfs_expire_snapshot 300
zfs_sync_pass_rewrite 2
zil_replay_disable 0
zfs_nocacheflush 0
zfs_arc_max 2147483648
zfs_arc_min 0
zfs_read_chunk_size 1048576
zfs_txg_timeout 5
zfs_pd_bytes_max 52428800
l2arc_headroom_boost 200
zfs_send_corrupt_data 0
l2arc_feed_min_ms 200
zfs_arc_meta_min 0
zfs_arc_average_blocksize 8192
zfetch_array_rd_sz 1048576
zfs_autoimport_disable 1
zfs_arc_p_min_shift 0
zio_requeue_io_start_cut_in_line 1
zfs_vdev_sync_read_max_active 10
zfs_mdcomp_disable 0
zfs_arc_num_sublists_per_state 4
Do you need any more information? and what do you think about this setup? is my hardware undersized?
I checked and the majority of my backups are < 200k files and < 20GBytes.
I hope it helps:
user@server ~ $ cat /var/log/syslog | grep zfs Mar 21 04:01:11 server kernel: [204939.534145] dmu_tx_wait+0x36f/0x390 [zfs] Mar 21 04:01:11 server kernel: [204939.534234] dmu_tx_assign+0x83/0x470 [zfs] Mar 21 04:01:11 server kernel: [204939.534339] zfs_write+0x3f6/0xd80 [zfs] Mar 21 04:01:11 server kernel: [204939.534439] ? rrw_exit+0x5a/0x150 [zfs] Mar 21 04:01:11 server kernel: [204939.534539] ? rrm_exit+0x46/0x80 [zfs] Mar 21 04:01:11 server kernel: [204939.534648] zpl_write_common_iovec+0x8c/0xe0 [zfs] Mar 21 04:01:11 server kernel: [204939.534752] zpl_iter_write+0xae/0xe0 [zfs] Mar 21 04:03:12 server kernel: [205060.371676] dmu_tx_wait+0x36f/0x390 [zfs] Mar 21 04:03:12 server kernel: [205060.371764] dmu_tx_assign+0x83/0x470 [zfs] Mar 21 04:03:12 server kernel: [205060.371869] zfs_write+0x3f6/0xd80 [zfs] Mar 21 04:03:12 server kernel: [205060.371969] ? rrw_exit+0x5a/0x150 [zfs] Mar 21 04:03:12 server kernel: [205060.372069] ? rrm_exit+0x46/0x80 [zfs] Mar 21 04:03:12 server kernel: [205060.372177] zpl_write_common_iovec+0x8c/0xe0 [zfs] Mar 21 04:03:12 server kernel: [205060.372282] zpl_iter_write+0xae/0xe0 [zfs] Mar 21 04:05:13 server kernel: [205181.209034] dmu_tx_wait+0x36f/0x390 [zfs] Mar 21 04:05:13 server kernel: [205181.209123] dmu_tx_assign+0x83/0x470 [zfs] Mar 21 04:05:13 server kernel: [205181.209228] zfs_write+0x3f6/0xd80 [zfs] Mar 21 04:05:13 server kernel: [205181.209329] ? rrw_exit+0x5a/0x150 [zfs] Mar 21 04:05:13 server kernel: [205181.209429] ? rrm_exit+0x46/0x80 [zfs] Mar 21 04:05:13 server kernel: [205181.209537] zpl_write_common_iovec+0x8c/0xe0 [zfs] Mar 21 04:05:13 server kernel: [205181.209641] zpl_iter_write+0xae/0xe0 [zfs] Mar 21 04:07:13 server kernel: [205302.045999] dmu_tx_wait+0x36f/0x390 [zfs] Mar 21 04:07:13 server kernel: [205302.046088] dmu_tx_assign+0x83/0x470 [zfs] Mar 21 04:07:13 server kernel: [205302.046192] zfs_write+0x3f6/0xd80 [zfs] Mar 21 04:07:13 server kernel: [205302.046293] ? rrw_exit+0x5a/0x150 [zfs] Mar 21 04:07:13 server kernel: [205302.046392] ? rrm_exit+0x46/0x80 [zfs] Mar 21 04:07:13 server kernel: [205302.046501] zpl_write_common_iovec+0x8c/0xe0 [zfs] Mar 21 04:07:13 server kernel: [205302.046605] zpl_iter_write+0xae/0xe0 [zfs] Mar 21 04:09:14 server kernel: [205422.882950] dmu_tx_wait+0x36f/0x390 [zfs] Mar 21 04:09:14 server kernel: [205422.883039] dmu_tx_assign+0x83/0x470 [zfs] Mar 21 04:09:14 server kernel: [205422.883144] zfs_write+0x3f6/0xd80 [zfs] Mar 21 04:09:14 server kernel: [205422.883244] ? rrw_exit+0x5a/0x150 [zfs] Mar 21 04:09:14 server kernel: [205422.883343] ? rrm_exit+0x46/0x80 [zfs] Mar 21 04:09:14 server kernel: [205422.883452] zpl_write_common_iovec+0x8c/0xe0 [zfs] Mar 21 04:09:14 server kernel: [205422.883556] zpl_iter_write+0xae/0xe0 [zfs] Mar 21 04:11:15 server kernel: [205543.719843] dmu_tx_wait+0x36f/0x390 [zfs] Mar 21 04:11:15 server kernel: [205543.719933] dmu_tx_assign+0x83/0x470 [zfs] Mar 21 04:11:15 server kernel: [205543.720037] zfs_write+0x3f6/0xd80 [zfs] Mar 21 04:11:15 server kernel: [205543.720137] ? rrw_exit+0x5a/0x150 [zfs] Mar 21 04:11:15 server kernel: [205543.720237] ? rrm_exit+0x46/0x80 [zfs] Mar 21 04:11:15 server kernel: [205543.720345] zpl_write_common_iovec+0x8c/0xe0 [zfs] Mar 21 04:11:15 server kernel: [205543.720449] zpl_iter_write+0xae/0xe0 [zfs] Mar 21 04:13:16 server kernel: [205664.556609] dmu_tx_wait+0x36f/0x390 [zfs] Mar 21 04:13:16 server kernel: [205664.556698] dmu_tx_assign+0x83/0x470 [zfs] Mar 21 04:13:16 server kernel: [205664.556802] zfs_write+0x3f6/0xd80 [zfs] Mar 21 04:13:16 server kernel: [205664.556903] ? rrw_exit+0x5a/0x150 [zfs] Mar 21 04:13:16 server kernel: [205664.557003] ? rrm_exit+0x46/0x80 [zfs] Mar 21 04:13:16 server kernel: [205664.557112] zpl_write_common_iovec+0x8c/0xe0 [zfs] Mar 21 04:13:16 server kernel: [205664.557216] zpl_iter_write+0xae/0xe0 [zfs] Mar 21 04:15:17 server kernel: [205785.393394] dmu_tx_wait+0x36f/0x390 [zfs] Mar 21 04:15:17 server kernel: [205785.393483] dmu_tx_assign+0x83/0x470 [zfs] Mar 21 04:15:17 server kernel: [205785.393587] zfs_write+0x3f6/0xd80 [zfs] Mar 21 04:15:17 server kernel: [205785.393687] ? rrw_exit+0x5a/0x150 [zfs] Mar 21 04:15:17 server kernel: [205785.393787] ? rrm_exit+0x46/0x80 [zfs] Mar 21 04:15:17 server kernel: [205785.393896] zpl_write_common_iovec+0x8c/0xe0 [zfs] Mar 21 04:15:17 server kernel: [205785.394000] zpl_iter_write+0xae/0xe0 [zfs] Mar 21 04:17:18 server kernel: [205906.229973] dmu_tx_wait+0x36f/0x390 [zfs] Mar 21 04:17:18 server kernel: [205906.230062] dmu_tx_assign+0x83/0x470 [zfs] Mar 21 04:17:18 server kernel: [205906.230166] zfs_write+0x3f6/0xd80 [zfs] Mar 21 04:17:18 server kernel: [205906.230289] ? rrw_exit+0x5a/0x150 [zfs] Mar 21 04:17:18 server kernel: [205906.230389] ? rrm_exit+0x46/0x80 [zfs] Mar 21 04:17:18 server kernel: [205906.230498] zpl_write_common_iovec+0x8c/0xe0 [zfs] Mar 21 04:17:18 server kernel: [205906.230602] zpl_iter_write+0xae/0xe0 [zfs] Mar 21 04:19:18 server kernel: [206027.066643] dmu_tx_wait+0x36f/0x390 [zfs] Mar 21 04:19:18 server kernel: [206027.066732] dmu_tx_assign+0x83/0x470 [zfs] Mar 21 04:19:18 server kernel: [206027.066837] zfs_write+0x3f6/0xd80 [zfs] Mar 21 04:19:18 server kernel: [206027.066937] ? rrw_exit+0x5a/0x150 [zfs] Mar 21 04:19:18 server kernel: [206027.067037] ? rrm_exit+0x46/0x80 [zfs] Mar 21 04:19:18 server kernel: [206027.067146] zpl_write_common_iovec+0x8c/0xe0 [zfs] Mar 21 04:19:18 server kernel: [206027.067250] zpl_iter_write+0xae/0xe0 [zfs] Mar 21 07:54:33 server systemd[1]: zfs-import-cache.service: Main process exited, code=exited, status=1/FAILURE Mar 21 07:54:33 server systemd[1]: zfs-import-cache.service: Failed with result 'exit-code'. user@server ~ $ cat /etc/issue Ubuntu Bionic Beaver (development branch) \n \l
user@server ~ $ uname -a Linux server 4.15.0-12-generic #13-Ubuntu SMP Thu Mar 8 06:24:47 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux user@server ~ $ dpkg -l zfs* Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-==============-============-============-================================= un zfs
(no description available) un zfs-dkms (no description available) un zfs-dracut (no description available) un zfs-fuse (no description available) un zfs-initramfs (no description available) un zfs-modules (no description available) ii zfs-zed 0.7.5-1ubunt amd64 OpenZFS Event Daemon un zfsutils (no description available) ii zfsutils-linux 0.7.5-1ubunt amd64 command-line tools to manage Open user@server ~ $ user@server ~ $ df -h /data Filesystem Size Used Avail Use% Mounted on data 14T 11T 3.1T 78% /data
it was mv process from one zfs to another in the same pool (actually it behaves as cp, not faster). the process hangs forever (at least few hours) and not allow to terminate with a signal. zpool reports errors-free. Now I've started zpool scrub
, let's see what it will report.
UPDATE:
user@server ~ $ zpool status pool: data state: ONLINE scan: scrub repaired 0B in 18h22m with 0 errors on Thu Mar 22 02:32:52 2018 config:
NAME STATE READ WRITE CKSUM data ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 sdf ONLINE 0 0 0 sdg ONLINE 0 0 0 sdh ONLINE 0 0 0 sdi ONLINE 0 0 0 sdj ONLINE 0 0 0 sdk ONLINE 0 0 0 sdl ONLINE 0 0 0 sdm ONLINE 0 0 0
errors: No known data errors
Let me know if you need other infos. With best regards, Ag
doing nothing but some basic rsync's with moderate sizes 4-10G result always in having approx 1MB/sec throughtput (very slow) on a up-date 16G RAM HP server - CentOS 6 with OpenVZ and selfcompiled modules for ZFS - dmesg: