Closed fwaggle closed 6 years ago
Now, we are two. Means it is in fact, exactly as you described. Doing the exact same with xenial+hwe in contrast, leads to extreme slowdown and from there to unusable anymore. SSD vs. HDD makes no difference, also. In bionic it is a nearly immediate please-kill-yourself. The situation on a customized-proxmox- and a vsido- -based setup is comparable.
When i eventually will be able to narrow down the specific surroundings, i will add a detailed comment, too.
@fwaggle, did you by chance experiment with different swap-settings/-types(zram/zcache/zswap) and/or limiting memory-reservation/caching-metadata-only and similar tunables?
@claudiusraphaelpaeth I'm afraid I didn't do any checking on that, I didn't think to look at swap as once I got it to die rather quickly, the servers aren't even touching swap. I'm not sure where to start with tweaking ZFS tunables either... I didn't save the top
output, but as I understand when I got it to reliably hang instantly, the CPU + memory load is extremely low - nothing stood out as a potential problem.
I'd definitely welcome any suggestions for things I can try though, since I can reproduce this outside of a production environment I can test just about anything. Unfortunately so far the only useful guidance has been "turn off dedup" which is certainly good advice (I can't reproduce the problem at all unless dedup is on, I'm convinced it goes away entirely) but not applicable for this exact use case. :(
I got the same issue with Ubuntu 18.04. It is not clear whether it is an issue with the ZFS/SPL versions in the Ubuntu Linux kernel or whether it is an upstream issue.
See also https://www.mail-archive.com/kernel-packages@lists.launchpad.net/msg276630.html
It suggests here to set spl_kmem_cache_slab_limit
to 0
.
Update: this workaround works.
You may find #7693 interesting.
I've opened #7693 with a proposed fix for this issue. @simos @claudiusraphaelpaeth @fwaggle if it all possible it would be very helpful if you could cherry-pick the fix and verify it resolves the deadlock your seeing. @fwaggle thanks for the detailed bug report, it had everything needed to get to the root cause.
@behlendorf Thanks!
I built ZFS 0.7.9 on Ubuntu Bionic (4.15.0-24-generic), verified I can reproduce the issue on the installed module (it hung after 1 container doing 5 in parallel), then applied the patch in #7693, reinstalled the module, and created 5x20 and 6x20 alpine containers in parallel without any issues a couple of times. Not super-rigorous testing, but I'm convinced.
I can confirm this bug is fixed by #7693 in my situation too: previously I used old kernel (4.13) to avoid this bug. But I want to perform clean install with new version of zfsutils (will you backport this fix to 0.7.5 as bionic stable?) and check it again before trying this on my servers :)
@DeadlyMercury we'll get it in our next point release. Canonical will have to decide if they want to apply it to the version they're shipping.
I have installed zfs
(master: 2e5dc44) and lxd
(master: 140c4a6). LXD has some recent performance fixes.
I use lxd-benchmark
and created 384 containers in batches of 24. I repeated this several times.
I did not manage to get ZFS to crash/deadlock.
ubuntu@myserver:~$ time lxd-benchmark launch --count 384 --parallel 24
Test environment:
Server backend: lxd
Server version: 3.2
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.15.0-20-generic
Storage backend: zfs
Storage version: 0.7.0-1472_g2e5dc449c
Container backend: lxc
Container version: 3.0.1
Test variables:
Container count: 384
Container mode: unprivileged
Startup mode: normal startup
Image: ubuntu:
Batches: 16
Batch size: 24
Remainder: 0
[Jul 12 19:44:41.326] Found image in local store: f2228450779fee27020d6024af587379b8f51062c32a335327f2b028c924bfa1
[Jul 12 19:44:41.327] Batch processing start
[Jul 12 19:44:54.448] Processed 24 containers in 13.121s (1.829/s)
[Jul 12 19:45:11.758] Processed 48 containers in 30.432s (1.577/s)
[Jul 12 19:45:57.044] Processed 96 containers in 75.718s (1.268/s)
[Jul 12 19:48:14.077] Processed 192 containers in 212.750s (0.902/s)
[Jul 12 19:55:58.700] Processed 384 containers in 677.373s (0.567/s)
[Jul 12 19:55:58.700] Batch processing completed in 677.373s
real 11m17,866s
user 0m6,908s
sys 0m1,208s
ubuntu@myserver:~$
I continued testing and I managed to get a crash with ZFS (master: commit 2e5dc44).
zfs-0.7.0-1472_g2e5dc449c
[10271.564074] INFO: task systemd:1 blocked for more than 120 seconds.
[10271.570397] Tainted: P OE 4.15.0-20-generic #21-Ubuntu
[10271.577212] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10271.585071] systemd D 0 1 0 0x00000000
[10271.585073] Call Trace:
[10271.585080] __schedule+0x297/0x8b0
[10271.585086] schedule+0x2c/0x80
[10271.585090] schedule_preempt_disabled+0xe/0x10
[10271.585095] __mutex_lock.isra.2+0x18c/0x4d0
[10271.585100] __mutex_lock_slowpath+0x13/0x20
[10271.585101] ? __mutex_lock_slowpath+0x13/0x20
[10271.585102] mutex_lock+0x2f/0x40
[10271.585106] proc_cgroup_show+0x4c/0x2a0
[10271.585108] proc_single_show+0x56/0x80
[10271.585111] seq_read+0xe5/0x430
[10271.585114] __vfs_read+0x1b/0x40
[10271.585115] vfs_read+0x8e/0x130
[10271.585117] SyS_read+0x55/0xc0
[10271.585120] do_syscall_64+0x73/0x130
[10271.585121] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[10271.585123] RIP: 0033:0x7fd47634d0b4
[10271.585124] RSP: 002b:00007ffe9a186700 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[10271.585125] RAX: ffffffffffffffda RBX: 0000000000000021 RCX: 00007fd47634d0b4
[10271.585126] RDX: 0000000000000400 RSI: 000055f70a08c0c0 RDI: 0000000000000021
[10271.585127] RBP: 000055f70a08c0c0 R08: 0000000000000000 R09: 0000000000000000
[10271.585128] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000400
[10271.585129] R13: 00007fd4766252a0 R14: 000055f70a0298e0 R15: 00000000000007ff
[10271.585221] INFO: task lxcfs:84510 blocked for more than 120 seconds.
[10271.591687] Tainted: P OE 4.15.0-20-generic #21-Ubuntu
[10271.598531] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10271.606401] lxcfs D 0 84510 1 0x00000000
[10271.606407] Call Trace:
[10271.606416] __schedule+0x297/0x8b0
[10271.606418] schedule+0x2c/0x80
[10271.606420] schedule_preempt_disabled+0xe/0x10
[10271.606421] __mutex_lock.isra.2+0x18c/0x4d0
[10271.606423] __mutex_lock_slowpath+0x13/0x20
[10271.606424] ? __mutex_lock_slowpath+0x13/0x20
[10271.606425] mutex_lock+0x2f/0x40
[10271.606427] proc_cgroup_show+0x4c/0x2a0
[10271.606429] proc_single_show+0x56/0x80
[10271.606432] seq_read+0xe5/0x430
[10271.606434] __vfs_read+0x1b/0x40
[10271.606436] vfs_read+0x8e/0x130
[10271.606437] SyS_read+0x55/0xc0
[10271.606440] do_syscall_64+0x73/0x130
[10271.606441] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[10271.606443] RIP: 0033:0x7f1a50e700b4
[10271.606443] RSP: 002b:00007f1a2b7fd870 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[10271.606445] RAX: ffffffffffffffda RBX: 0000000000000013 RCX: 00007f1a50e700b4
[10271.606446] RDX: 0000000000000400 RSI: 00007f1a04000f90 RDI: 0000000000000013
[10271.606446] RBP: 00007f1a04000f90 R08: 0000000000000001 R09: 0000000000000000
[10271.606447] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000400
[10271.606448] R13: 00007f1a511482a0 R14: 0000000000000000 R15: 00007f1a38003a40
[10271.606450] INFO: task lxcfs:84676 blocked for more than 120 seconds.
[10271.612911] Tainted: P OE 4.15.0-20-generic #21-Ubuntu
[10271.619722] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10271.627577] lxcfs D 0 84676 1 0x00000000
[10271.627579] Call Trace:
[10271.627583] __schedule+0x297/0x8b0
[10271.627589] schedule+0x2c/0x80
[10271.627594] schedule_preempt_disabled+0xe/0x10
[10271.627598] __mutex_lock.isra.2+0x18c/0x4d0
[10271.627600] __mutex_lock_slowpath+0x13/0x20
[10271.627601] ? __mutex_lock_slowpath+0x13/0x20
[10271.627602] mutex_lock+0x2f/0x40
[10271.627604] proc_cgroup_show+0x4c/0x2a0
[10271.627606] proc_single_show+0x56/0x80
[10271.627608] seq_read+0xe5/0x430
[10271.627610] __vfs_read+0x1b/0x40
[10271.627611] vfs_read+0x8e/0x130
[10271.627613] SyS_read+0x55/0xc0
[10271.627615] do_syscall_64+0x73/0x130
[10271.627617] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[10271.627618] RIP: 0033:0x7f1a50e700b4
[10271.627618] RSP: 002b:00007f1a297f9870 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[10271.627620] RAX: ffffffffffffffda RBX: 0000000000000014 RCX: 00007f1a50e700b4
[10271.627620] RDX: 0000000000000400 RSI: 00007f1a2c001e80 RDI: 0000000000000014
[10271.627621] RBP: 00007f1a2c001e80 R08: 0000000000000001 R09: 0000000000000000
[10271.627622] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000400
[10271.627622] R13: 00007f1a511482a0 R14: 0000000000000000 R15: 00007f1a2c002b10
[10271.627625] INFO: task lxcfs:115507 blocked for more than 120 seconds.
[10271.634180] Tainted: P OE 4.15.0-20-generic #21-Ubuntu
[10271.641007] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10271.648870] lxcfs D 0 115507 1 0x00000000
[10271.648872] Call Trace:
[10271.648876] __schedule+0x297/0x8b0
[10271.648883] schedule+0x2c/0x80
[10271.648887] schedule_preempt_disabled+0xe/0x10
[10271.648891] __mutex_lock.isra.2+0x18c/0x4d0
[10271.648895] __mutex_lock_slowpath+0x13/0x20
[10271.648896] ? __mutex_lock_slowpath+0x13/0x20
[10271.648897] mutex_lock+0x2f/0x40
[10271.648899] proc_cgroup_show+0x4c/0x2a0
[10271.648902] proc_single_show+0x56/0x80
[10271.648904] seq_read+0xe5/0x430
[10271.648906] __vfs_read+0x1b/0x40
[10271.648907] vfs_read+0x8e/0x130
[10271.648909] SyS_read+0x55/0xc0
[10271.648911] do_syscall_64+0x73/0x130
[10271.648913] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[10271.648914] RIP: 0033:0x7f1a50e700b4
[10271.648915] RSP: 002b:00007f1a29ffa870 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[10271.648916] RAX: ffffffffffffffda RBX: 0000000000000015 RCX: 00007f1a50e700b4
[10271.648917] RDX: 0000000000000400 RSI: 00007f19f0001820 RDI: 0000000000000015
[10271.648918] RBP: 00007f19f0001820 R08: 0000000000000001 R09: 0000000000000000
[10271.648925] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000400
[10271.648928] R13: 00007f1a511482a0 R14: 0000000000000000 R15: 00007f19f00024d0
[10271.648964] INFO: task kworker/14:1:111280 blocked for more than 120 seconds.
[10271.656131] Tainted: P OE 4.15.0-20-generic #21-Ubuntu
[10271.662944] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10271.670822] kworker/14:1 D 0 111280 2 0x80000000
[10271.670827] Workqueue: cgroup_destroy css_killed_work_fn
[10271.670828] Call Trace:
[10271.670832] __schedule+0x297/0x8b0
[10271.670834] schedule+0x2c/0x80
[10271.670836] schedule_preempt_disabled+0xe/0x10
[10271.670837] __mutex_lock.isra.2+0x18c/0x4d0
[10271.670841] ? wake_up_q+0x44/0x80
[10271.670842] __mutex_lock_slowpath+0x13/0x20
[10271.670843] ? __mutex_lock_slowpath+0x13/0x20
[10271.670844] mutex_lock+0x2f/0x40
[10271.670845] css_killed_work_fn+0x1f/0xe0
[10271.670848] process_one_work+0x1de/0x410
[10271.670849] worker_thread+0x32/0x410
[10271.670851] kthread+0x121/0x140
[10271.670852] ? process_one_work+0x410/0x410
[10271.670854] ? kthread_create_worker_on_cpu+0x70/0x70
[10271.670856] ? kthread_create_worker_on_cpu+0x70/0x70
[10271.670857] ret_from_fork+0x22/0x40
[10271.670859] INFO: task kworker/29:0:111571 blocked for more than 120 seconds.
[10271.678018] Tainted: P OE 4.15.0-20-generic #21-Ubuntu
[10271.684827] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10271.692681] kworker/29:0 D 0 111571 2 0x80000000
[10271.692686] Workqueue: cgroup_destroy css_release_work_fn
[10271.692691] Call Trace:
[10271.692694] __schedule+0x297/0x8b0
[10271.692697] ? ttwu_do_wakeup+0x1e/0x150
[10271.692699] schedule+0x2c/0x80
[10271.692700] schedule_preempt_disabled+0xe/0x10
[10271.692702] __mutex_lock.isra.2+0x18c/0x4d0
[10271.692709] ? wake_up_q+0x44/0x80
[10271.692714] __mutex_lock_slowpath+0x13/0x20
[10271.692719] ? __mutex_lock_slowpath+0x13/0x20
[10271.692723] mutex_lock+0x2f/0x40
[10271.692728] css_release_work_fn+0x2b/0x180
[10271.692733] process_one_work+0x1de/0x410
[10271.692735] worker_thread+0x32/0x410
[10271.692737] kthread+0x121/0x140
[10271.692738] ? process_one_work+0x410/0x410
[10271.692740] ? kthread_create_worker_on_cpu+0x70/0x70
[10271.692742] ? do_syscall_64+0x115/0x130
[10271.692743] ret_from_fork+0x22/0x40
[10271.692752] INFO: task kworker/2:1:128568 blocked for more than 120 seconds.
[10271.699827] Tainted: P OE 4.15.0-20-generic #21-Ubuntu
[10271.706653] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10271.714524] kworker/2:1 D 0 128568 2 0x80000000
[10271.714529] Workqueue: cgroup_destroy css_release_work_fn
[10271.714530] Call Trace:
[10271.714534] __schedule+0x297/0x8b0
[10271.714536] schedule+0x2c/0x80
[10271.714537] schedule_preempt_disabled+0xe/0x10
[10271.714539] __mutex_lock.isra.2+0x18c/0x4d0
[10271.714540] __mutex_lock_slowpath+0x13/0x20
[10271.714541] ? __mutex_lock_slowpath+0x13/0x20
[10271.714543] mutex_lock+0x2f/0x40
[10271.714544] css_release_work_fn+0x2b/0x180
[10271.714545] process_one_work+0x1de/0x410
[10271.714547] worker_thread+0x32/0x410
[10271.714548] kthread+0x121/0x140
[10271.714550] ? process_one_work+0x410/0x410
[10271.714551] ? kthread_create_worker_on_cpu+0x70/0x70
[10271.714552] ? kthread_create_worker_on_cpu+0x70/0x70
[10271.714554] ret_from_fork+0x22/0x40
[10271.714564] INFO: task kworker/43:4:46368 blocked for more than 120 seconds.
[10271.721637] Tainted: P OE 4.15.0-20-generic #21-Ubuntu
[10271.728456] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10271.736309] kworker/43:4 D 0 46368 2 0x80000000
[10271.736314] Workqueue: cgroup_destroy css_killed_work_fn
[10271.736321] Call Trace:
[10271.736328] __schedule+0x297/0x8b0
[10271.736330] schedule+0x2c/0x80
[10271.736332] schedule_preempt_disabled+0xe/0x10
[10271.736333] __mutex_lock.isra.2+0x18c/0x4d0
[10271.736336] ? wake_up_q+0x44/0x80
[10271.736337] __mutex_lock_slowpath+0x13/0x20
[10271.736338] ? __mutex_lock_slowpath+0x13/0x20
[10271.736339] mutex_lock+0x2f/0x40
[10271.736341] css_killed_work_fn+0x1f/0xe0
[10271.736342] process_one_work+0x1de/0x410
[10271.736344] worker_thread+0x32/0x410
[10271.736346] kthread+0x121/0x140
[10271.736352] ? process_one_work+0x410/0x410
[10271.736357] ? kthread_create_worker_on_cpu+0x70/0x70
[10271.736362] ret_from_fork+0x22/0x40
[10271.736367] INFO: task kworker/13:3:49251 blocked for more than 120 seconds.
[10271.743439] Tainted: P OE 4.15.0-20-generic #21-Ubuntu
[10271.750250] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10271.758106] kworker/13:3 D 0 49251 2 0x80000000
[10271.758110] Workqueue: cgroup_destroy css_killed_work_fn
[10271.758111] Call Trace:
[10271.758119] __schedule+0x297/0x8b0
[10271.758122] schedule+0x2c/0x80
[10271.758123] schedule_preempt_disabled+0xe/0x10
[10271.758124] __mutex_lock.isra.2+0x18c/0x4d0
[10271.758127] ? wake_up_q+0x44/0x80
[10271.758128] __mutex_lock_slowpath+0x13/0x20
[10271.758129] ? __mutex_lock_slowpath+0x13/0x20
[10271.758130] mutex_lock+0x2f/0x40
[10271.758131] css_killed_work_fn+0x1f/0xe0
[10271.758133] process_one_work+0x1de/0x410
[10271.758134] worker_thread+0x32/0x410
[10271.758136] kthread+0x121/0x140
[10271.758143] ? process_one_work+0x410/0x410
[10271.758148] ? kthread_create_worker_on_cpu+0x70/0x70
[10271.758153] ret_from_fork+0x22/0x40
[10271.758161] INFO: task kworker/38:1:21823 blocked for more than 120 seconds.
[10271.765236] Tainted: P OE 4.15.0-20-generic #21-Ubuntu
[10271.772047] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10271.779902] kworker/38:1 D 0 21823 2 0x80000000
[10271.779907] Workqueue: cgroup_destroy css_release_work_fn
[10271.779911] Call Trace:
[10271.779918] __schedule+0x297/0x8b0
[10271.779920] schedule+0x2c/0x80
[10271.779922] schedule_preempt_disabled+0xe/0x10
[10271.779923] __mutex_lock.isra.2+0x18c/0x4d0
[10271.779925] ? wake_up_q+0x44/0x80
[10271.779926] __mutex_lock_slowpath+0x13/0x20
[10271.779928] ? __mutex_lock_slowpath+0x13/0x20
[10271.779929] mutex_lock+0x2f/0x40
[10271.779930] css_release_work_fn+0x2b/0x180
[10271.779935] process_one_work+0x1de/0x410
[10271.779941] worker_thread+0x32/0x410
[10271.779946] kthread+0x121/0x140
[10271.779952] ? process_one_work+0x410/0x410
[10271.779955] ? kthread_create_worker_on_cpu+0x70/0x70
[10271.779959] ret_from_fork+0x22/0x40
root@myserver:~#
@simos thanks for the additional testing and reporting your results. The hung tasks your now seeing don't appear to be related to ZFS based on the stack traces you posted. So the original issue does appear to be fixed and you've uncovered a new one, which looks to be related to cgroups.
@behlendorf Thanks. I did a preliminary report at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1781601 and keep an eye on reproducing during more LXD stress-tests.
this is to remind myself to check if my issue is solved with the mentioned commit
Type | Version/Name |
---|---|
Distribution Name | gentoo |
Distribution Version | 17.1 |
Linux Kernel | 4.18.8 |
Architecture | amd64 |
ZFS Version | 0.7.11 |
SPL Version | 0.7.11 |
Issue is reproducible each time when starting an lxc container while /var/lib/lxc/rootfs is on a dedup enabled dataset. I had no issues with this setup on 0.7.6 and earlier versions
This could be related: lxc-3.0.2 is needed for lxd to run on 4.18
@fling- can you open a new issue with the offending stack traces so we can determine if this is a similar problem. It's not clear to me why dedup would matter here aside from the performance implications.
I got this recently: [852786.703564] INFO: task smbd:27186 blocked for more than 120 seconds. [852786.703567] Tainted: P O 4.14.157-gentoo-20191204-1553-datastore4 #1 [852786.703568] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [852786.703570] smbd D 0 27186 21509 0x00000100 [852786.703573] Call Trace: [852786.703582] ? schedule+0x247/0x760 [852786.703584] schedule+0x23/0x70 [852786.703586] io_schedule+0xd/0x40 [852786.703590] wait_on_page_bit+0xd2/0x100 [852786.703593] ? add_to_page_cache_lru+0xb0/0xb0 [852786.703596] filemap_fdatawait_range+0xc5/0x100 [852786.703599] ? __filemap_fdatawrite_range+0x80/0xa0 [852786.703602] filemap_write_and_wait_range+0x36/0x60 [852786.703633] zpl_putpage+0x682/0xf10 [zfs] [852786.703637] do_fsync+0x33/0x70 [852786.703640] SyS_fdatasync+0xa/0x10 [852786.703644] do_syscall_64+0x5b/0xf0 [852786.703648] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [852786.703650] RIP: 0033:0x7f93776fd348 [852786.703651] RSP: 002b:00007fffc707e7d0 EFLAGS: 00000246 ORIG_RAX: 000000000000004b [852786.703654] RAX: ffffffffffffffda RBX: 0000000000003000 RCX: 00007f93776fd348 [852786.703655] RDX: 000000000000805c RSI: 0000000000003000 RDI: 0000000000000019 [852786.703656] RBP: 000000000000805c R08: 0000000000002184 R09: 0000000000000000 [852786.703658] R10: 000000000000030f R11: 0000000000000246 R12: 00005631496fba50 [852786.703659] R13: 00007f93769b2d40 R14: 000056314975db78 R15: 00005631496fba50 [854752.783302] INFO: task smbd:8672 blocked for more than 120 seconds. [854752.783305] Tainted: P O 4.14.157-gentoo-20191204-1553-datastore4 #1 [854752.783305] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [854752.783307] smbd D 0 8672 21509 0x00000100 [854752.783311] Call Trace: [854752.783321] ? schedule+0x247/0x760 [854752.783324] ? bpf_prog_run64+0x30/0x30 [854752.783326] schedule+0x23/0x70 [854752.783330] rwsem_down_write_failed+0x131/0x210 [854752.783335] call_rwsem_down_write_failed+0x13/0x20 [854752.783337] down_write+0x24/0x40 [854752.783342] do_unlinkat+0xff/0x2e0 [854752.783347] do_syscall_64+0x5b/0xf0 [854752.783350] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [854752.783353] RIP: 0033:0x7f93776f7747 [854752.783354] RSP: 002b:00007fffc707f7f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000057 [854752.783357] RAX: ffffffffffffffda RBX: 0000000000000014 RCX: 00007f93776f7747 [854752.783358] RDX: 0000000000000014 RSI: 00007fffc707f9d0 RDI: 00007fffc707fa12 [854752.783359] RBP: 00007fffc707fa10 R08: 0000000000000001 R09: 0000000000000014 [854752.783360] R10: 00007fffc707f9d0 R11: 0000000000000246 R12: 00007fffc707fa80 [854752.783361] R13: 00005631496cce50 R14: 00005631496dd950 R15: 0000000000000006 [854752.783364] INFO: task smbd:8817 blocked for more than 120 seconds.
This is on datastore4 log # zfs --version zfs-0.8.0-434_g5a0897737 zfs-kmod-0.8.0-434_g5a0897737
datastore4 log # uname -a Linux datastore4 4.14.157-gentoo-20191204-1553-datastore4 #1 SMP Wed Dec 4 15:57:28 EST 2019 x86_64 Intel(R) Xeon(R) CPU E31230 @ 3.20GHz GenuineIntel GNU/Linux
System information
When using ZFS-backed LXC/LXD, hung tasks on concurrent container creation
We run into this very occasionally in production, and I managed to distill a test scenario down to guarantee I run into it.
With multiple concurrent LXD operations ongoing, ZFS will end up in a stuck state and block forever.
We can reproduce this on JonF's 0.7.9 builds as well. I can't seem to work out a scenario not involving LXC.
I spoke to some folks on Freenode about this issue, and the general consensus seems to be "turn dedup off", which we're investigating, but as it's so trivially reproducible for me I figured I'd file a report with all the data anyway.
Describe how to reproduce the problem
Test Environment:
VirtualBox VM: 8GB RAM, 4 vCPUs. Host is i7-7700k, 16GB non-ECC, on Windows 10. Both vdisks are on a 7200rpm SATA.
10GB vDisk for Root Partition - sda 100GB vDisk for ZFS pool - sdb
Install Ubuntu, install LXD 3.1 from
snap
, initialize lxd with a zfs storage pool on sdb, then turn dedup on for the pool.No other configuration changes are made.
Run three or four concurrent jobs repeatedly running lxc launch, and quite quickly all disk activity will stop and two minutes later you'll get hung task messages on the console. I run four of these and it locks ZFS up very quickly:
After only a couple of successful container creations, everything will stop and the system will go completely idle. Any further attempts to interact with ZFS will block forever and a reboot is required.
The system never recovers, I left it overnight and it stayed stuck.
Include any warning/errors/backtraces from the system logs
perf top output:
Stuck processes after the event:
zpool status -D:
iostat -x for the zpool device: