openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.53k stars 1.74k forks source link

spl_dynamic_task blocked for more than x seconds (spl_kthread_create in stack trace) on CentOS 7.7 #9675

Open dhagberg opened 4 years ago

dhagberg commented 4 years ago

System information

Type Version/Name
Distribution Name CentOS
Distribution Version 7.7.1908
Linux Kernel 3.10.0-1062.4.3.el7.x86_64
Architecture x86_64
ZFS Version 0.8.2-1 (kmod)
SPL Version 0.8.2-1 (kmod)

Describe the problem you're observing

System degrades and becomes unresponsive, eventually displaying hung kernel task message on console similar to below.

Describe how to reproduce the problem

Unfortunately I do not have a reproducible set of conditions other than medium/high load on a production Zimbra mailserver with the mail store and mysql on zfs.

Include any warning/errors/backtraces from the system logs

Note: system running with following kernel config to force automatic panics and reboots in this condition in order to avoid manual intervention:

# Auto reboot 5 seconds after panic
kernel.panic = 5

# Panic if a hung task was found
kernel.hung_task_panic = 1

# Setup timeout for hung task to 160 seconds, just over 2 min
kernel.hung_task_timeout_secs = 160

Most recent panic from 2019-12-04-18:34:14Z:

[    7.272333] type=1305 audit(1574907165.962:3): audit_pid=2019 old=0 auid=4294967295 ses=4294967295 res=1
[    7.549360] nf_conntrack version 0.5.0 (65536 buckets, 262144 max)
[    7.641766] NET: Registered protocol family 40
[    7.939654] vmxnet3 0000:0b:00.0 eno16780032: intr type 3, mode 0, 5 vectors allocated
[    7.941205] vmxnet3 0000:0b:00.0 eno16780032: NIC Link is Up 10000 Mbps
[577300.702371] INFO: task spl_dynamic_tas:1131 blocked for more than 160 seconds.
[577300.702404] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[577300.702429] spl_dynamic_tas D ffff92d2eefad230     0  1131      2 0x00000000
[577300.702456] Call Trace:
[577300.702471]  [<ffffffffbb17fb09>] schedule+0x29/0x70
[577300.702492]  [<ffffffffbb17d491>] schedule_timeout+0x221/0x2d0
[577300.702515]  [<ffffffffbaad7632>] ? check_preempt_curr+0x92/0xa0
[577300.702535]  [<ffffffffbaad7659>] ? ttwu_do_wakeup+0x19/0xe0
[577300.702554]  [<ffffffffbb17febd>] wait_for_completion+0xfd/0x140
[577300.703219]  [<ffffffffbaadb1d0>] ? wake_up_state+0x20/0x20
[577300.703736]  [<ffffffffc05c2540>] ? taskq_thread_spawn+0x60/0x60 [spl]
[577300.704179]  [<ffffffffbaac604a>] kthread_create_on_node+0xaa/0x140
[577300.704626]  [<ffffffffbad8caeb>] ? string.isra.7+0x3b/0xf0
[577300.705061]  [<ffffffffc05c2540>] ? taskq_thread_spawn+0x60/0x60 [spl]
[577300.705513]  [<ffffffffc05c2540>] ? taskq_thread_spawn+0x60/0x60 [spl]
[577300.705954]  [<ffffffffc05c3bfc>] spl_kthread_create+0x9c/0xf0 [spl]
[577300.706405]  [<ffffffffc05c339b>] taskq_thread_create+0x6b/0x110 [spl]
[577300.706846]  [<ffffffffc05c3452>] taskq_thread_spawn_task+0x12/0x40 [spl]
[577300.707292]  [<ffffffffc05c27ec>] taskq_thread+0x2ac/0x4f0 [spl]
[577300.707745]  [<ffffffffbaadb1d0>] ? wake_up_state+0x20/0x20
[577300.708199]  [<ffffffffc05c2540>] ? taskq_thread_spawn+0x60/0x60 [spl]
[577300.708664]  [<ffffffffbaac61f1>] kthread+0xd1/0xe0
[577300.709120]  [<ffffffffbaac6120>] ? insert_kthread_work+0x40/0x40
[577300.709586]  [<ffffffffbb18cd37>] ret_from_fork_nospec_begin+0x21/0x21
[577300.710064]  [<ffffffffbaac6120>] ? insert_kthread_work+0x40/0x40
[577300.710555] sending NMI to all CPUs:
[577300.712132] NMI backtrace for cpu 0 skipped: idling at pc 0xffffffffbb181beb
[577300.712625] NMI backtrace for cpu 1
[577300.713115] CPU: 1 PID: 40 Comm: khungtaskd Kdump: loaded Tainted: P           OE  ------------   3.10.0-1062.4.3.el7.x86_64 #1
[577300.713651] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
[577300.714196] task: ffff92d2ffd6a0e0 ti: ffff92d13acfc000 task.ti: ffff92d13acfc000
[577300.714755] RIP: 0010:[<ffffffffbaa6d5ba>]  [<ffffffffbaa6d5ba>] native_write_msr_safe+0xa/0x10
[577300.715326] RSP: 0018:ffff92d13acffdb8  EFLAGS: 00000046
[577300.715911] RAX: 0000000000000400 RBX: 0000000000000001 RCX: 0000000000000830
[577300.716503] RDX: 0000000000000002 RSI: 0000000000000400 RDI: 0000000000000830
[577300.717094] RBP: ffff92d13acffdb8 R08: ffffffffbb7577a0 R09: ffff92d2ec50fac0
[577300.717695] R10: 0000000000000618 R11: ffffb017823279d8 R12: ffffffffbb7577a0
[577300.718294] R13: 0000000000000001 R14: 000000000000e026 R15: 0000000000000002
[577300.718905] FS:  0000000000000000(0000) GS:ffff92d2ffc40000(0000) knlGS:0000000000000000
[577300.719529] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[577300.720147] CR2: 00007ff4fb51c370 CR3: 000000024df5e000 CR4: 00000000001607e0
[577300.720790] Call Trace:
[577300.721422]  [<ffffffffbaa634f2>] __x2apic_send_IPI_mask+0xb2/0xe0
[577300.722062]  [<ffffffffbaa63593>] x2apic_send_IPI_mask+0x13/0x20
[577300.722723]  [<ffffffffbaa5e923>] arch_trigger_all_cpu_backtrace+0x2c3/0x2d0
[577300.723389]  [<ffffffffbab4d990>] watchdog+0x260/0x2c0
[577300.724042]  [<ffffffffbab4d730>] ? reset_hung_task_detector+0x20/0x20
[577300.724707]  [<ffffffffbaac61f1>] kthread+0xd1/0xe0
[577300.725356]  [<ffffffffbaac6120>] ? insert_kthread_work+0x40/0x40
[577300.725986]  [<ffffffffbb18cd37>] ret_from_fork_nospec_begin+0x21/0x21
[577300.726618]  [<ffffffffbaac6120>] ? insert_kthread_work+0x40/0x40
[577300.727295] Code: 00 55 89 f9 48 89 e5 0f 32 31 c9 89 c0 48 c1 e2 20 89 0e 48 09 c2 48 89 d0 5d c3 66 0f 1f 44 00 00 55 89 f0 89 f9 48 89 e5 0f 30 <31> c0 5d c3 66 90 55 89 f9 48 89 e5 0f 33 89 c0 48 c1 e2 20 48
[577300.728639] NMI backtrace for cpu 2 skipped: idling at pc 0xffffffffbb181beb
[577300.729321] NMI backtrace for cpu 3 skipped: idling at pc 0xffffffffbb181beb
[577300.730008] NMI backtrace for cpu 4 skipped: idling at pc 0xffffffffbb181beb
[577300.730697] NMI backtrace for cpu 5 skipped: idling at pc 0xffffffffbb181beb
[577300.731369] Kernel panic - not syncing: hung_task: blocked tasks
[577300.732034] CPU: 1 PID: 40 Comm: khungtaskd Kdump: loaded Tainted: P           OE  ------------   3.10.0-1062.4.3.el7.x86_64 #1
[577300.732727] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
[577300.733433] Call Trace:
[577300.734135]  [<ffffffffbb179ba4>] dump_stack+0x19/0x1b
[577300.734852]  [<ffffffffbb173947>] panic+0xe8/0x21f
[577300.735561]  [<ffffffffbab4d99e>] watchdog+0x26e/0x2c0
[577300.736270]  [<ffffffffbab4d730>] ? reset_hung_task_detector+0x20/0x20
[577300.736990]  [<ffffffffbaac61f1>] kthread+0xd1/0xe0
[577300.737703]  [<ffffffffbaac6120>] ? insert_kthread_work+0x40/0x40
[577300.738418]  [<ffffffffbb18cd37>] ret_from_fork_nospec_begin+0x21/0x21
[577300.739135]  [<ffffffffbaac6120>] ? insert_kthread_work+0x40/0x40

Prior panic from 2019-11-28-02:12:28Z:

[    7.214377] type=1305 audit(1574661699.898:3): audit_pid=2032 old=0 auid=4294967295 ses=4294967295 res=1
[    7.674542] NET: Registered protocol family 40
[    7.676506] nf_conntrack version 0.5.0 (65536 buckets, 262144 max)
[    8.045584] vmxnet3 0000:0b:00.0 eno16780032: intr type 3, mode 0, 5 vectors allocated
[    8.046805] vmxnet3 0000:0b:00.0 eno16780032: NIC Link is Up 10000 Mbps
[245456.203425] INFO: task spl_dynamic_tas:1073 blocked for more than 160 seconds.
[245456.203457] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[245456.203483] spl_dynamic_tas D ffff9ccf2fd29070     0  1073      2 0x00000000
[245456.203509] Call Trace:
[245456.203526]  [<ffffffffa177fb09>] schedule+0x29/0x70
[245456.203547]  [<ffffffffa177d491>] schedule_timeout+0x221/0x2d0
[245456.203569]  [<ffffffffa10d7632>] ? check_preempt_curr+0x92/0xa0
[245456.203590]  [<ffffffffa10d7659>] ? ttwu_do_wakeup+0x19/0xe0
[245456.203609]  [<ffffffffa177febd>] wait_for_completion+0xfd/0x140
[245456.203630]  [<ffffffffa10db1d0>] ? wake_up_state+0x20/0x20
[245456.203657]  [<ffffffffc047e540>] ? taskq_thread_spawn+0x60/0x60 [spl]
[245456.203682]  [<ffffffffa10c604a>] kthread_create_on_node+0xaa/0x140
[245456.203708]  [<ffffffffa138caeb>] ? string.isra.7+0x3b/0xf0
[245456.203732]  [<ffffffffc047e540>] ? taskq_thread_spawn+0x60/0x60 [spl]
[245456.203758]  [<ffffffffc047e540>] ? taskq_thread_spawn+0x60/0x60 [spl]
[245456.203785]  [<ffffffffc047fbfc>] spl_kthread_create+0x9c/0xf0 [spl]
[245456.203810]  [<ffffffffc047f39b>] taskq_thread_create+0x6b/0x110 [spl]
[245456.203834]  [<ffffffffc047f452>] taskq_thread_spawn_task+0x12/0x40 [spl]
[245456.203858]  [<ffffffffc047e7ec>] taskq_thread+0x2ac/0x4f0 [spl]
[245456.203879]  [<ffffffffa10db1d0>] ? wake_up_state+0x20/0x20
[245456.203900]  [<ffffffffc047e540>] ? taskq_thread_spawn+0x60/0x60 [spl]
[245456.203922]  [<ffffffffa10c61f1>] kthread+0xd1/0xe0
[245456.203940]  [<ffffffffa10c6120>] ? insert_kthread_work+0x40/0x40
[245456.203961]  [<ffffffffa178cd37>] ret_from_fork_nospec_begin+0x21/0x21
[245456.203983]  [<ffffffffa10c6120>] ? insert_kthread_work+0x40/0x40
[245456.204005] sending NMI to all CPUs:
[245456.205129] NMI backtrace for cpu 0
[245456.205168] CPU: 0 PID: 7598 Comm: du Kdump: loaded Tainted: P           OE  ------------   3.10.0-1062.4.3.el7.x86_64 #1
[245456.205201] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
[245456.205812] task: ffff9cceea2b5230 ti: ffff9ccd27488000 task.ti: ffff9ccd27488000
[245456.206368] RIP: 0010:[<ffffffffa177fd89>]  [<ffffffffa177fd89>] _cond_resched+0x19/0x50
[245456.206931] RSP: 0000:ffff9ccd2748ae60  EFLAGS: 00000202
[245456.207487] RAX: 00000000ffffffff RBX: ffff9ccd2748b248 RCX: ffff9ccd2748bfd8
[245456.208053] RDX: 0000000000000010 RSI: ffff9ccd2748af08 RDI: fffff32344320aa0
[245456.208627] RBP: ffff9ccd2748af90 R08: fffff32344320a60 R09: ffff9ccd2748b000
[245456.209200] R10: ffff9ccf3ffda000 R11: ffff9cceea2b5230 R12: fffff32344320aa0
[245456.209778] R13: ffff9ccd2748b018 R14: fffff32344320a80 R15: ffff9ccf3ffda000
[245456.210360] FS:  00007fe09f057740(0000) GS:ffff9ccf3fc00000(0000) knlGS:0000000000000000
[245456.210948] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[245456.211549] CR2: 00007f811efc437c CR3: 00000002150de000 CR4: 00000000001607f0
[245456.212158] Call Trace:
[245456.212771]  [<ffffffffa11d18c6>] ? shrink_page_list+0x146/0xc30
[245456.213390]  [<ffffffffa11d11b3>] ? isolate_lru_pages.isra.47+0xd3/0x190
[245456.214014]  [<ffffffffa11d29d6>] shrink_inactive_list+0x1c6/0x5d0
[245456.214650]  [<ffffffffa11d34d5>] shrink_lruvec+0x385/0x740
[245456.215295]  [<ffffffffc0310152>] ? xfs_perag_get_tag+0x42/0xe0 [xfs]
[245456.215930]  [<ffffffffa11d3906>] shrink_zone+0x76/0x1a0
[245456.216576]  [<ffffffffa11d3df0>] do_try_to_free_pages+0xf0/0x520
[245456.217206]  [<ffffffffa11d431c>] try_to_free_pages+0xfc/0x180
[245456.217824]  [<ffffffffa11c7f41>] __alloc_pages_nodemask+0x831/0xbe0
[245456.218436]  [<ffffffffa1216298>] alloc_pages_current+0x98/0x110
[245456.219058]  [<ffffffffa1223f6d>] new_slab+0x44d/0x4e0
[245456.219662]  [<ffffffffa12243ac>] ___slab_alloc+0x3ac/0x4f0
[245456.220260]  [<ffffffffc047a7a9>] ? spl_kmem_cache_alloc+0x99/0x150 [spl]
[245456.220877]  [<ffffffffc047a7a9>] ? spl_kmem_cache_alloc+0x99/0x150 [spl]
[245456.221484]  [<ffffffffa1776570>] __slab_alloc+0x40/0x5c
[245456.222079]  [<ffffffffa12247cb>] kmem_cache_alloc+0x19b/0x1f0
[245456.222684]  [<ffffffffc047a7a9>] ? spl_kmem_cache_alloc+0x99/0x150 [spl]
[245456.223297]  [<ffffffffc047a7a9>] spl_kmem_cache_alloc+0x99/0x150 [spl]
[245456.223898]  [<ffffffffc06e5bf7>] zio_buf_alloc+0x57/0x60 [zfs]
[245456.224508]  [<ffffffffc05e7e5d>] arc_get_data_buf.isra.34+0x4d/0x60 [zfs]
[245456.225112]  [<ffffffffc05e9578>] arc_buf_alloc_impl.isra.38+0x218/0x340 [zfs]
[245456.225737]  [<ffffffffc05eb4db>] arc_read+0xddb/0x10b0 [zfs]
[245456.226358]  [<ffffffffa1224665>] ? kmem_cache_alloc+0x35/0x1f0
[245456.226979]  [<ffffffffc047a7a9>] ? spl_kmem_cache_alloc+0x99/0x150 [spl]
[245456.227607]  [<ffffffffc05f6c30>] ? dbuf_rele_and_unlock+0x5c0/0x5c0 [zfs]
[245456.228230]  [<ffffffffc05f49fc>] dbuf_read_impl+0x20c/0x670 [zfs]
[245456.228863]  [<ffffffffc05f5bca>] dbuf_read+0xca/0x5e0 [zfs]
[245456.229490]  [<ffffffffc061b3ed>] dnode_hold_impl+0x12d/0xcd0 [zfs]
[245456.230109]  [<ffffffffc03e677a>] ? avl_add+0x4a/0xa0 [zavl]
[245456.230738]  [<ffffffffa177dc02>] ? mutex_lock+0x12/0x2f
[245456.231372]  [<ffffffffc06dad50>] ? zfs_znode_hold_enter+0x130/0x190 [zfs]
[245456.231994]  [<ffffffffc061bfab>] dnode_hold+0x1b/0x20 [zfs]
[245456.232621]  [<ffffffffc0600475>] dmu_bonus_hold+0x35/0x80 [zfs]
[245456.233241]  [<ffffffffc06571fe>] sa_buf_hold+0xe/0x10 [zfs]
[245456.233857]  [<ffffffffc06dedd3>] zfs_zget+0x123/0x250 [zfs]
[245456.234453]  [<ffffffffc06b406a>] zfs_dirent_lock+0x51a/0x660 [zfs]
[245456.235026]  [<ffffffffc06b4247>] zfs_dirlook+0x97/0x2d0 [zfs]
[245456.235597]  [<ffffffffc0655b79>] ? rrw_enter_read_impl+0xb9/0x170 [zfs]
[245456.236155]  [<ffffffffc06d00e2>] zfs_lookup+0x362/0x3d0 [zfs]
[245456.236719]  [<ffffffffc06f81d9>] zpl_lookup+0xd9/0x210 [zfs]
[245456.237285]  [<ffffffffa1265578>] ? d_alloc+0x58/0x70
[245456.237830]  [<ffffffffa1254e43>] lookup_real+0x23/0x60
[245456.238385]  [<ffffffffa1255862>] __lookup_hash+0x42/0x60
[245456.238925]  [<ffffffffa17767f5>] lookup_slow+0x42/0xa7
[245456.239466]  [<ffffffffa125ad78>] path_lookupat+0x838/0x8b0
[245456.239988]  [<ffffffffa1224665>] ? kmem_cache_alloc+0x35/0x1f0
[245456.240696]  [<ffffffffa125bbcf>] ? getname_flags+0x4f/0x1a0
[245456.241234]  [<ffffffffa125ae1b>] filename_lookup+0x2b/0xc0
[245456.241752]  [<ffffffffa125cd67>] user_path_at_empty+0x67/0xc0
[245456.242255]  [<ffffffffa124fe05>] ? cp_new_stat+0x165/0x1a0
[245456.242767]  [<ffffffffa125cdd1>] user_path_at+0x11/0x20
[245456.243267]  [<ffffffffa124fab3>] vfs_fstatat+0x63/0xc0
[245456.243772]  [<ffffffffa124ff24>] SYSC_newfstatat+0x24/0x60
[245456.244264]  [<ffffffffa113db66>] ? __audit_syscall_exit+0x1e6/0x280
[245456.244772]  [<ffffffffa125034e>] SyS_newfstatat+0xe/0x10
[245456.245267]  [<ffffffffa178cede>] system_call_fastpath+0x25/0x2a
[245456.245775] Code: 0e 69 95 ff e9 ea fe ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 65 48 8b 0c 25 78 0e 01 00 48 8b 91 38 c0 ff ff 48 c1 ea 03 <83> e2 01 85 d2 89 d0 75 02 f3 c3 f6 81 47 c0 ff ff 10 74 04 31
[245456.246874] NMI backtrace for cpu 1
[245456.247426] CPU: 1 PID: 16193 Comm: crond Kdump: loaded Tainted: P           OE  ------------   3.10.0-1062.4.3.el7.x86_64 #1
[245456.248000] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
[245456.248609] task: ffff9ccc35d841c0 ti: ffff9ccc03504000 task.ti: ffff9ccc03504000
[245456.249212] RIP: 0010:[<ffffffffa11c44c0>]  [<ffffffffa11c44c0>] get_pageblock_flags_group+0x60/0x80
[245456.249843] RSP: 0000:ffff9ccc03507a88  EFLAGS: 00000246
[245456.250471] RAX: 0000000000000000 RBX: 0000000000001ea8 RCX: 0000000000000001
[245456.251096] RDX: 0000000000000003 RSI: 0000000000000003 RDI: 000000000000003c
[245456.251727] RBP: ffff9ccc03507a88 R08: 0000000000000001 R09: 000000000000003f
[245456.252355] R10: ffff9ccf3fbd7400 R11: ffffffffffffffdc R12: 00000000000000a9
[245456.252968] R13: ffff9ccf3ffd9800 R14: fffff32340078000 R15: 0000000000002000
[245456.253589] FS:  00007feb36382840(0000) GS:ffff9ccf3fc40000(0000) knlGS:0000000000000000
[245456.254209] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[245456.254849] CR2: 00007fc8fbcdfb60 CR3: 0000000003286000 CR4: 00000000001607e0
[245456.255488] Call Trace:
[245456.256108]  [<ffffffffa11e4e70>] isolate_migratepages_range+0x410/0x7c0
[245456.256756]  [<ffffffffa11e5586>] compact_zone+0x2b6/0x440
[245456.257407]  [<ffffffffa11e57ac>] compact_zone_order+0x9c/0xf0
[245456.258037]  [<ffffffffa11e5b71>] try_to_compact_pages+0x121/0x1a0
[245456.258679]  [<ffffffffa17750c5>] __alloc_pages_direct_compact+0xac/0x193
[245456.259323]  [<ffffffffa11c7ecc>] __alloc_pages_nodemask+0x7bc/0xbe0
[245456.259950]  [<ffffffffa1098f6d>] copy_process+0x1dd/0x1a50
[245456.260589]  [<ffffffffa109a991>] do_fork+0x91/0x330
[245456.261212]  [<ffffffffa109acb6>] SyS_clone+0x16/0x20
[245456.261848]  [<ffffffffa178d2b4>] stub_clone+0x44/0x70
[245456.262483]  [<ffffffffa178cede>] ? system_call_fastpath+0x25/0x2a
[245456.263110] Code: 05 48 01 c8 48 c1 ef 07 4c 8b 50 08 81 e7 fc 00 00 00 39 d6 7f 31 b9 01 00 00 00 31 c0 66 0f 1f 44 00 00 44 8d 0c 3e 45 0f a3 0a <45> 19 c9 49 89 c0 49 09 c8 45 85 c9 49 0f 45 c0 83 c6 01 48 01
[245456.264457] NMI backtrace for cpu 2
[245456.265127] CPU: 2 PID: 16184 Comm: crond Kdump: loaded Tainted: P           OE  ------------   3.10.0-1062.4.3.el7.x86_64 #1
[245456.265844] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
[245456.266577] task: ffff9ccd12f90000 ti: ffff9ccdd8350000 task.ti: ffff9ccdd8350000
[245456.267330] RIP: 0010:[<ffffffffa11c44c3>]  [<ffffffffa11c44c3>] get_pageblock_flags_group+0x63/0x80
[245456.268080] RSP: 0018:ffff9ccdd8353a88  EFLAGS: 00000246
[245456.268842] RAX: 0000000000000000 RBX: 0000000000001eab RCX: 0000000000000001
[245456.269613] RDX: 0000000000000003 RSI: 0000000000000003 RDI: 000000000000003c
[245456.270382] RBP: ffff9ccdd8353a88 R08: 0000000000000001 R09: 0000000000000000
[245456.271139] R10: ffff9ccf3fbd7400 R11: ffffffffffffffdc R12: 00000000000000ac
[245456.271909] R13: ffff9ccf3ffd9800 R14: fffff32340078000 R15: 0000000000002000
[245456.272681] FS:  00007feb36382840(0000) GS:ffff9ccf3fc80000(0000) knlGS:0000000000000000
[245456.273462] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[245456.274232] CR2: 00007fef30f12820 CR3: 00000001d5a4c000 CR4: 00000000001607e0
[245456.275026] Call Trace:
[245456.275813]  [<ffffffffa11e4e70>] isolate_migratepages_range+0x410/0x7c0
[245456.276615]  [<ffffffffa11e5586>] compact_zone+0x2b6/0x440
[245456.277420]  [<ffffffffa11e57ac>] compact_zone_order+0x9c/0xf0
[245456.278183]  [<ffffffffa11e5b71>] try_to_compact_pages+0x121/0x1a0
[245456.278934]  [<ffffffffa17750c5>] __alloc_pages_direct_compact+0xac/0x193
[245456.279685]  [<ffffffffa11c7ecc>] __alloc_pages_nodemask+0x7bc/0xbe0
[245456.280419]  [<ffffffffa1098f6d>] copy_process+0x1dd/0x1a50
[245456.281124]  [<ffffffffa109a991>] do_fork+0x91/0x330
[245456.281821]  [<ffffffffa109acb6>] SyS_clone+0x16/0x20
[245456.282503]  [<ffffffffa178d2b4>] stub_clone+0x44/0x70
[245456.283162]  [<ffffffffa178cede>] ? system_call_fastpath+0x25/0x2a
[245456.283830] Code: c8 48 c1 ef 07 4c 8b 50 08 81 e7 fc 00 00 00 39 d6 7f 31 b9 01 00 00 00 31 c0 66 0f 1f 44 00 00 44 8d 0c 3e 45 0f a3 0a 45 19 c9 <49> 89 c0 49 09 c8 45 85 c9 49 0f 45 c0 83 c6 01 48 01 c9 39 f2
[245456.285255] NMI backtrace for cpu 3
[245456.285972] CPU: 3 PID: 40 Comm: khungtaskd Kdump: loaded Tainted: P           OE  ------------   3.10.0-1062.4.3.el7.x86_64 #1
[245456.286728] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
[245456.287496] task: ffff9ccf3fd6a0e0 ti: ffff9ccf3fbe8000 task.ti: ffff9ccf3fbe8000
[245456.288247] RIP: 0010:[<ffffffffa106d5ba>]  [<ffffffffa106d5ba>] native_write_msr_safe+0xa/0x10
[245456.289022] RSP: 0018:ffff9ccf3fbebdb8  EFLAGS: 00000046
[245456.289807] RAX: 0000000000000400 RBX: 0000000000000003 RCX: 0000000000000830
[245456.290579] RDX: 0000000000000006 RSI: 0000000000000400 RDI: 0000000000000830
[245456.291345] RBP: ffff9ccf3fbebdb8 R08: ffffffffa1d577a0 R09: ffff9ccc3541f840
[245456.292107] R10: 0000000000000618 R11: 0000000000000000 R12: ffffffffa1d577a0
[245456.292885] R13: 0000000000000003 R14: 000000000000e026 R15: 0000000000000002
[245456.293658] FS:  0000000000000000(0000) GS:ffff9ccf3fcc0000(0000) knlGS:0000000000000000
[245456.294441] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[245456.295211] CR2: 00007f1cbbab0620 CR3: 000000031551a000 CR4: 00000000001607e0
[245456.296002] Call Trace:
[245456.296790]  [<ffffffffa10634f2>] __x2apic_send_IPI_mask+0xb2/0xe0
[245456.297600]  [<ffffffffa1063593>] x2apic_send_IPI_mask+0x13/0x20
[245456.298404]  [<ffffffffa105e923>] arch_trigger_all_cpu_backtrace+0x2c3/0x2d0
[245456.299177]  [<ffffffffa114d990>] watchdog+0x260/0x2c0
[245456.299937]  [<ffffffffa114d730>] ? reset_hung_task_detector+0x20/0x20
[245456.300696]  [<ffffffffa10c61f1>] kthread+0xd1/0xe0
[245456.301440]  [<ffffffffa10c6120>] ? insert_kthread_work+0x40/0x40
[245456.302157]  [<ffffffffa178cd37>] ret_from_fork_nospec_begin+0x21/0x21
[245456.302873]  [<ffffffffa10c6120>] ? insert_kthread_work+0x40/0x40
[245456.303575] Code: 00 55 89 f9 48 89 e5 0f 32 31 c9 89 c0 48 c1 e2 20 89 0e 48 09 c2 48 89 d0 5d c3 66 0f 1f 44 00 00 55 89 f0 89 f9 48 89 e5 0f 30 <31> c0 5d c3 66 90 55 89 f9 48 89 e5 0f 33 89 c0 48 c1 e2 20 48
[245456.305044] NMI backtrace for cpu 4
[245456.305781] CPU: 4 PID: 16188 Comm: crond Kdump: loaded Tainted: P           OE  ------------   3.10.0-1062.4.3.el7.x86_64 #1
[245456.306553] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
[245456.307346] task: ffff9ccc5f10d230 ti: ffff9cce137bc000 task.ti: ffff9cce137bc000
[245456.308134] RIP: 0010:[<ffffffffa11e4d4f>]  [<ffffffffa11e4d4f>] isolate_migratepages_range+0x2ef/0x7c0
[245456.308941] RSP: 0000:ffff9cce137bfa98  EFLAGS: 00000286
[245456.309739] RAX: ffff9cce137bfbb0 RBX: 0000000000001ea9 RCX: 0000000000001ea9
[245456.310543] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 000000000000003c
[245456.311338] RBP: ffff9cce137bfb48 R08: 0000000000000001 R09: fffff3234007aa40
[245456.312122] R10: ffff9ccf3fbd7400 R11: ffffffffffffffdc R12: 00000000000000aa
[245456.312916] R13: ffff9ccf3ffd9800 R14: fffff32340078000 R15: 0000000000002000
[245456.313712] FS:  00007feb36382840(0000) GS:ffff9ccf3fd00000(0000) knlGS:0000000000000000
[245456.314521] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[245456.315330] CR2: 00007f1cf46d3790 CR3: 0000000157762000 CR4: 00000000001607e0
[245456.316138] Call Trace:
[245456.316956]  [<ffffffffa11e5586>] compact_zone+0x2b6/0x440
[245456.317778]  [<ffffffffa11e57ac>] compact_zone_order+0x9c/0xf0
[245456.318600]  [<ffffffffa11e5b71>] try_to_compact_pages+0x121/0x1a0
[245456.319427]  [<ffffffffa17750c5>] __alloc_pages_direct_compact+0xac/0x193
[245456.320246]  [<ffffffffa11c7ecc>] __alloc_pages_nodemask+0x7bc/0xbe0
[245456.321052]  [<ffffffffa1098f6d>] copy_process+0x1dd/0x1a50
[245456.321831]  [<ffffffffa109a991>] do_fork+0x91/0x330
[245456.322604]  [<ffffffffa109acb6>] SyS_clone+0x16/0x20
[245456.323361]  [<ffffffffa178d2b4>] stub_clone+0x44/0x70
[245456.324085]  [<ffffffffa178cede>] ? system_call_fastpath+0x25/0x2a
[245456.324809] Code: 84 af 00 00 00 48 c1 e8 36 48 8b 04 c5 60 7c d5 a1 48 8b 80 48 65 02 00 49 39 c5 75 aa 48 8b 45 b0 4d 85 f6 48 89 d9 4d 0f 44 f1 <48> c1 e9 09 80 78 41 00 0f 84 f3 00 00 00 41 8b 41 18 83 f8 80
[245456.326301] NMI backtrace for cpu 5
[245456.327040] CPU: 5 PID: 15401 Comm: zmstat-mtaqueue Kdump: loaded Tainted: P           OE  ------------   3.10.0-1062.4.3.el7.x86_64 #1
[245456.327828] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
[245456.328626] task: ffff9cce146820e0 ti: ffff9cce13754000 task.ti: ffff9cce13754000
[245456.329428] RIP: 0010:[<ffffffffa11e4e64>]  [<ffffffffa11e4e64>] isolate_migratepages_range+0x404/0x7c0
[245456.330238] RSP: 0000:ffff9cce13757a98  EFLAGS: 00000246
[245456.331057] RAX: ffff9cce13757bb0 RBX: 0000000000001ea4 RCX: 000000000000000f
[245456.331876] RDX: 0000000000000003 RSI: 0000000000000003 RDI: fffff3234007a900
[245456.332693] RBP: ffff9cce13757b48 R08: 0000000000000001 R09: fffff3234007a900
[245456.333503] R10: ffff9ccf3fbd7400 R11: ffffffffffffffdc R12: 00000000000000a5
[245456.334315] R13: ffff9ccf3ffd9800 R14: fffff32340078000 R15: 0000000000002000
[245456.335114] FS:  00007fe395825740(0000) GS:ffff9ccf3fd40000(0000) knlGS:0000000000000000
[245456.335927] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[245456.336737] CR2: 00007f1cf4db2b60 CR3: 000000021369e000 CR4: 00000000001607e0
[245456.337562] Call Trace:
[245456.338374]  [<ffffffffa11e5586>] compact_zone+0x2b6/0x440
[245456.339186]  [<ffffffffa11e57ac>] compact_zone_order+0x9c/0xf0
[245456.340005]  [<ffffffffa11e5b71>] try_to_compact_pages+0x121/0x1a0
[245456.340824]  [<ffffffffa17750c5>] __alloc_pages_direct_compact+0xac/0x193
[245456.341647]  [<ffffffffa11c7ecc>] __alloc_pages_nodemask+0x7bc/0xbe0
[245456.342420]  [<ffffffffa1098f6d>] copy_process+0x1dd/0x1a50
[245456.343177]  [<ffffffffa109a991>] do_fork+0x91/0x330
[245456.343928]  [<ffffffffa109acb6>] SyS_clone+0x16/0x20
[245456.344658]  [<ffffffffa178d2b4>] stub_clone+0x44/0x70
[245456.345371]  [<ffffffffa178cede>] ? system_call_fastpath+0x25/0x2a
[245456.346063] Code: 0f 84 35 ff ff ff 48 8b 45 b0 c6 40 43 01 eb 3e 0f 1f 80 00 00 00 00 4c 89 cf ba 03 00 00 00 be 03 00 00 00 48 89 8d 68 ff ff ff <4c> 89 8d 70 ff ff ff e8 f0 f5 fd ff 48 85 c0 4c 8b 8d 70 ff ff
[245456.347527] Kernel panic - not syncing: hung_task: blocked tasks
[245456.348265] CPU: 3 PID: 40 Comm: khungtaskd Kdump: loaded Tainted: P           OE  ------------   3.10.0-1062.4.3.el7.x86_64 #1
[245456.349023] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
[245456.349785] Call Trace:
[245456.350557]  [<ffffffffa1779ba4>] dump_stack+0x19/0x1b
[245456.351329]  [<ffffffffa1773947>] panic+0xe8/0x21f
[245456.352086]  [<ffffffffa114d99e>] watchdog+0x26e/0x2c0
[245456.352837]  [<ffffffffa114d730>] ? reset_hung_task_detector+0x20/0x20
[245456.353585]  [<ffffffffa10c61f1>] kthread+0xd1/0xe0
[245456.354332]  [<ffffffffa10c6120>] ? insert_kthread_work+0x40/0x40
[245456.355095]  [<ffffffffa178cd37>] ret_from_fork_nospec_begin+0x21/0x21
[245456.355847]  [<ffffffffa10c6120>] ? insert_kthread_work+0x40/0x40

Prior panic from 2019-11-24-06:37:14:

[    8.203926] type=1305 audit(1574097558.840:3): audit_pid=2197 old=0 auid=4294967295 ses=4294967295 res=1
[    8.456030] nf_conntrack version 0.5.0 (65536 buckets, 262144 max)
[    8.602974] NET: Registered protocol family 40
[    8.817318] vmxnet3 0000:0b:00.0 eno16780032: intr type 3, mode 0, 5 vectors allocated
[    8.818421] vmxnet3 0000:0b:00.0 eno16780032: NIC Link is Up 10000 Mbps
[479887.347042] INFO: task spl_dynamic_tas:1156 blocked for more than 160 seconds.
[479887.347090] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[479887.347130] spl_dynamic_tas D ffffa045adbf1070     0  1156      2 0x00000000
[479887.347157] Call Trace:
[479887.347184]  [<ffffffff9d97f229>] schedule+0x29/0x70
[479887.347212]  [<ffffffff9d97cbb1>] schedule_timeout+0x221/0x2d0
[479887.347241]  [<ffffffff9d2e0026>] ? select_task_rq_fair+0x5a6/0x760
[479887.347262]  [<ffffffff9d97f5dd>] wait_for_completion+0xfd/0x140
[479887.347284]  [<ffffffff9d2da0b0>] ? wake_up_state+0x20/0x20
[479887.347325]  [<ffffffffc06f2540>] ? taskq_thread_spawn+0x60/0x60 [spl]
[479887.347351]  [<ffffffff9d2c4f2a>] kthread_create_on_node+0xaa/0x140
[479887.347380]  [<ffffffff9d58b23b>] ? string.isra.7+0x3b/0xf0
[479887.347402]  [<ffffffffc06f2540>] ? taskq_thread_spawn+0x60/0x60 [spl]
[479887.347425]  [<ffffffffc06f2540>] ? taskq_thread_spawn+0x60/0x60 [spl]
[479887.347452]  [<ffffffffc06f3bfc>] spl_kthread_create+0x9c/0xf0 [spl]
[479887.347475]  [<ffffffffc06f339b>] taskq_thread_create+0x6b/0x110 [spl]
[479887.347497]  [<ffffffffc06f3452>] taskq_thread_spawn_task+0x12/0x40 [spl]
[479887.347520]  [<ffffffffc06f27ec>] taskq_thread+0x2ac/0x4f0 [spl]
[479887.347541]  [<ffffffff9d2da0b0>] ? wake_up_state+0x20/0x20
[479887.347571]  [<ffffffffc06f2540>] ? taskq_thread_spawn+0x60/0x60 [spl]
[479887.347593]  [<ffffffff9d2c50d1>] kthread+0xd1/0xe0
[479887.347610]  [<ffffffff9d2c5000>] ? insert_kthread_work+0x40/0x40
[479887.347633]  [<ffffffff9d98cd37>] ret_from_fork_nospec_begin+0x21/0x21
[479887.347655]  [<ffffffff9d2c5000>] ? insert_kthread_work+0x40/0x40
[479887.347675] sending NMI to all CPUs:
[479887.348824] NMI backtrace for cpu 0
[479887.348838] CPU: 0 PID: 40 Comm: khungtaskd Kdump: loaded Tainted: P           OE  ------------   3.10.0-1062.1.2.el7.x86_64 #1
[479887.348876] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
[479887.348911] task: ffffa045bfd6a0e0 ti: ffffa045bfbe8000 task.ti: ffffa045bfbe8000
[479887.348934] RIP: 0010:[<ffffffff9d26c4da>]  [<ffffffff9d26c4da>] native_write_msr_safe+0xa/0x10
[479887.348959] RSP: 0018:ffffa045bfbebdb8  EFLAGS: 00000046
[479887.348976] RAX: 0000000000000400 RBX: 0000000000000000 RCX: 0000000000000830
[479887.348997] RDX: 0000000000000000 RSI: 0000000000000400 RDI: 0000000000000830
[479887.349018] RBP: ffffa045bfbebdb8 R08: ffffffff9df577e0 R09: ffffa042b54a6440
[479887.349039] R10: 0000000000000616 R11: 0000000000000000 R12: ffffffff9df577e0
[479887.349061] R13: 0000000000000000 R14: 000000000000e026 R15: 0000000000000002
[479887.349082] FS:  0000000000000000(0000) GS:ffffa045bfc00000(0000) knlGS:0000000000000000
[479887.349106] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[479887.349123] CR2: 00007fa70ad25618 CR3: 00000000aff54000 CR4: 00000000001607f0
[479887.349161] Call Trace:
[479887.349171]  [<ffffffff9d2630d2>] __x2apic_send_IPI_mask+0xb2/0xe0
[479887.349190]  [<ffffffff9d263173>] x2apic_send_IPI_mask+0x13/0x20
[479887.349209]  [<ffffffff9d25e503>] arch_trigger_all_cpu_backtrace+0x2c3/0x2d0
[479887.349231]  [<ffffffff9d34c880>] watchdog+0x260/0x2c0
[479887.349867]  [<ffffffff9d34c620>] ? reset_hung_task_detector+0x20/0x20
[479887.350527]  [<ffffffff9d2c50d1>] kthread+0xd1/0xe0
[479887.351133]  [<ffffffff9d2c5000>] ? insert_kthread_work+0x40/0x40
[479887.351748]  [<ffffffff9d98cd37>] ret_from_fork_nospec_begin+0x21/0x21
[479887.352347]  [<ffffffff9d2c5000>] ? insert_kthread_work+0x40/0x40
[479887.352938] Code: 00 55 89 f9 48 89 e5 0f 32 31 c9 89 c0 48 c1 e2 20 89 0e 48 09 c2 48 89 d0 5d c3 66 0f 1f 44 00 00 55 89 f0 89 f9 48 89 e5 0f 30 <31> c0 5d c3 66 90 55 89 f9 48 89 e5 0f 33 89 c0 48 c1 e2 20 48
[479887.354167] NMI backtrace for cpu 1 skipped: idling at pc 0xffffffff9d98130b
[479887.354789] NMI backtrace for cpu 2 skipped: idling at pc 0xffffffff9d98130b
[479887.355417] NMI backtrace for cpu 3 skipped: idling at pc 0xffffffff9d98130b
[479887.356037] NMI backtrace for cpu 4 skipped: idling at pc 0xffffffff9d98130b
[479887.356638] NMI backtrace for cpu 5 skipped: idling at pc 0xffffffff9d98130b
[479887.357252] Kernel panic - not syncing: hung_task: blocked tasks
[479887.357854] CPU: 0 PID: 40 Comm: khungtaskd Kdump: loaded Tainted: P           OE  ------------   3.10.0-1062.1.2.el7.x86_64 #1
[479887.358482] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
[479887.359119] Call Trace:
[479887.359756]  [<ffffffff9d9792c2>] dump_stack+0x19/0x1b
[479887.360452]  [<ffffffff9d972941>] panic+0xe8/0x21f
[479887.361097]  [<ffffffff9d34c88e>] watchdog+0x26e/0x2c0
[479887.361739]  [<ffffffff9d34c620>] ? reset_hung_task_detector+0x20/0x20
[479887.362387]  [<ffffffff9d2c50d1>] kthread+0xd1/0xe0
[479887.363046]  [<ffffffff9d2c5000>] ? insert_kthread_work+0x40/0x40
[479887.363694]  [<ffffffff9d98cd37>] ret_from_fork_nospec_begin+0x21/0x21
[479887.364345]  [<ffffffff9d2c5000>] ? insert_kthread_work+0x40/0x40
dhagberg commented 4 years ago

Could this be some kind of resource starvation issue where the kernel is unable to create a new kthread?

Should I be looking at the zfs ARC size vs available kernel memory?

Or does this look like a legit bug?

PrivatePuffin commented 4 years ago

It looks like an issue we also hit while working on ZSTD. It seems RHEL (based Distro's) has/have serieus issues with kmem allocations.

dhagberg commented 4 years ago

Thanks @Ornias1993 -- did you find a workaround? More stable under heavy IO under latest Ubuntu LTS release?

PrivatePuffin commented 4 years ago

For zstd we simply opted to use a totally different system for memory allocation... which solved the problem there.

To be honest I haven't seen this error on normal use/testing, even on RHEL based systems...

dhagberg commented 4 years ago

Latest stack trace at 2019-12-09 19:09:22Z also has spl_kthread_create:

[    8.709736] type=1305 audit(1575484473.397:3): audit_pid=2357 old=0 auid=4294967295 ses=4294967295 res=1
[    9.192114] nf_conntrack version 0.5.0 (65536 buckets, 262144 max)
[    9.223197] NET: Registered protocol family 40
[    9.600365] vmxnet3 0000:0b:00.0 eno16780032: intr type 3, mode 0, 5 vectors allocated
[    9.601841] vmxnet3 0000:0b:00.0 eno16780032: NIC Link is Up 10000 Mbps
[434100.315676] INFO: task spl_dynamic_tas:1171 blocked for more than 160 seconds.
[434100.315706] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[434100.315731] spl_dynamic_tas D ffff8f1435cd3150     0  1171      2 0x00000000
[434100.315757] Call Trace:
[434100.315773]  [<ffffffffa2f7fb09>] schedule+0x29/0x70
[434100.315816]  [<ffffffffa2f7d491>] schedule_timeout+0x221/0x2d0
[434100.315840]  [<ffffffffa28e10b6>] ? select_task_rq_fair+0x5a6/0x760
[434100.315861]  [<ffffffffa2f7febd>] wait_for_completion+0xfd/0x140
[434100.315883]  [<ffffffffa28db1d0>] ? wake_up_state+0x20/0x20
[434100.315910]  [<ffffffffc061c540>] ? taskq_thread_spawn+0x60/0x60 [spl]
[434100.315933]  [<ffffffffa28c604a>] kthread_create_on_node+0xaa/0x140
[434100.315955]  [<ffffffffa2b8caeb>] ? string.isra.7+0x3b/0xf0
[434100.315977]  [<ffffffffc061c540>] ? taskq_thread_spawn+0x60/0x60 [spl]
[434100.316001]  [<ffffffffc061c540>] ? taskq_thread_spawn+0x60/0x60 [spl]
[434100.316025]  [<ffffffffc061dbfc>] spl_kthread_create+0x9c/0xf0 [spl]
[434100.316049]  [<ffffffffc061d39b>] taskq_thread_create+0x6b/0x110 [spl]
[434100.316072]  [<ffffffffc061d452>] taskq_thread_spawn_task+0x12/0x40 [spl]
[434100.316096]  [<ffffffffc061c7ec>] taskq_thread+0x2ac/0x4f0 [spl]
[434100.316116]  [<ffffffffa28db1d0>] ? wake_up_state+0x20/0x20
[434100.316691]  [<ffffffffc061c540>] ? taskq_thread_spawn+0x60/0x60 [spl]
[434100.317159]  [<ffffffffa28c61f1>] kthread+0xd1/0xe0
[434100.317613]  [<ffffffffa28c6120>] ? insert_kthread_work+0x40/0x40
[434100.318074]  [<ffffffffa2f8cd37>] ret_from_fork_nospec_begin+0x21/0x21
[434100.318535]  [<ffffffffa28c6120>] ? insert_kthread_work+0x40/0x40
[434100.319014] sending NMI to all CPUs:
[434100.320586] NMI backtrace for cpu 0 skipped: idling at pc 0xffffffffa2f81beb
[434100.321078] NMI backtrace for cpu 1
[434100.321567] CPU: 1 PID: 40 Comm: khungtaskd Kdump: loaded Tainted: P           OE  ------------   3.10.0-1062.4.3.el7.x86_64 #1
[434100.322095] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
[434100.322633] task: ffff8f173fe820e0 ti: ffff8f173fbec000 task.ti: ffff8f173fbec000
[434100.323202] RIP: 0010:[<ffffffffa286d5ba>]  [<ffffffffa286d5ba>] native_write_msr_safe+0xa/0x10
[434100.323790] RSP: 0018:ffff8f173fbefdb8  EFLAGS: 00000046
[434100.324362] RAX: 0000000000000400 RBX: 0000000000000001 RCX: 0000000000000830
[434100.324951] RDX: 0000000000000002 RSI: 0000000000000400 RDI: 0000000000000830
[434100.325530] RBP: ffff8f173fbefdb8 R08: ffffffffa35577a0 R09: ffff8f143547fac0
[434100.326135] R10: 0000000000000619 R11: ffffb4c8822a79d8 R12: ffffffffa35577a0
[434100.326737] R13: 0000000000000001 R14: 000000000000e026 R15: 0000000000000002
[434100.327328] FS:  0000000000000000(0000) GS:ffff8f173fc40000(0000) knlGS:0000000000000000
[434100.327943] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[434100.328560] CR2: 0000562e45714b10 CR3: 0000000332d0e000 CR4: 00000000001607e0
[434100.329199] Call Trace:
[434100.329825]  [<ffffffffa28634f2>] __x2apic_send_IPI_mask+0xb2/0xe0
[434100.330462]  [<ffffffffa2863593>] x2apic_send_IPI_mask+0x13/0x20
[434100.331111]  [<ffffffffa285e923>] arch_trigger_all_cpu_backtrace+0x2c3/0x2d0
[434100.331792]  [<ffffffffa294d990>] watchdog+0x260/0x2c0
[434100.332450]  [<ffffffffa294d730>] ? reset_hung_task_detector+0x20/0x20
[434100.333118]  [<ffffffffa28c61f1>] kthread+0xd1/0xe0
[434100.333795]  [<ffffffffa28c6120>] ? insert_kthread_work+0x40/0x40
[434100.334446]  [<ffffffffa2f8cd37>] ret_from_fork_nospec_begin+0x21/0x21
[434100.335088]  [<ffffffffa28c6120>] ? insert_kthread_work+0x40/0x40
[434100.335722] Code: 00 55 89 f9 48 89 e5 0f 32 31 c9 89 c0 48 c1 e2 20 89 0e 48 09 c2 48 89 d0 5d c3 66 0f 1f 44 00 00 55 89 f0 89 f9 48 89 e5 0f 30 <31> c0 5d c3 66 90 55 89 f9 48 89 e5 0f 33 89 c0 48 c1 e2 20 48
[434100.337060] NMI backtrace for cpu 2 skipped: idling at pc 0xffffffffa2f81beb
[434100.337752] NMI backtrace for cpu 3 skipped: idling at pc 0xffffffffa2f81beb
[434100.338425] NMI backtrace for cpu 4 skipped: idling at pc 0xffffffffa2f81beb
[434100.339100] NMI backtrace for cpu 5 skipped: idling at pc 0xffffffffa2f81beb
[434100.339766] Kernel panic - not syncing: hung_task: blocked tasks
[434100.340430] CPU: 1 PID: 40 Comm: khungtaskd Kdump: loaded Tainted: P           OE  ------------   3.10.0-1062.4.3.el7.x86_64 #1
[434100.341119] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
[434100.341842] Call Trace:
[434100.342541]  [<ffffffffa2f79ba4>] dump_stack+0x19/0x1b
[434100.343256]  [<ffffffffa2f73947>] panic+0xe8/0x21f
[434100.343987]  [<ffffffffa294d99e>] watchdog+0x26e/0x2c0
[434100.344696]  [<ffffffffa294d730>] ? reset_hung_task_detector+0x20/0x20
[434100.345411]  [<ffffffffa28c61f1>] kthread+0xd1/0xe0
[434100.346124]  [<ffffffffa28c6120>] ? insert_kthread_work+0x40/0x40
[434100.346846]  [<ffffffffa2f8cd37>] ret_from_fork_nospec_begin+0x21/0x21
[434100.347566]  [<ffffffffa28c6120>] ? insert_kthread_work+0x40/0x40
dhagberg commented 4 years ago

Additional notes:

So prior to the latest panic, I had made the following changes to use the noop scheduler on this device:

# cat /sys/block/sdc/queue/scheduler
noop [deadline] cfq
# echo noop > /sys/block/sdc/queue/scheduler
# cat /sys/block/sdc/queue/scheduler
[noop] deadline cfq

And a udev rule to ensure that change was applied on reboot:

# cat /etc/udev/rules.d/70-ioschedulers.rules
ACTION=="add|change", KERNEL=="sd[a-z]", ENV{ID_FS_UUID}=="14120504590319066518", ATTR{queue/scheduler}="noop"

Those config changes had been made 2019-12-04 19:37Z, approx 5 days before the most recent stall/panic.

zmalone commented 4 years ago

I have have hosts which run into this behavior, they are all Cent7 machines with current kernels on AWS or GCP. If there's anything I can do to gather information beyond what dhagberg did, feel free to reach out. We see this happen on hosts which are low on memory, under reasonable load, but not actually running out of memory yet.

stale[bot] commented 3 years ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

shodanshok commented 2 years ago

I just had a very similar issue. System is a CentOS Linux release 7.9.2009 (Core) hypervisor, running 3.10.0-1160.31.1.el7.x86_64 kernel and ZFS kmod 2.0.5-1. RAM is at 32 GB. This is the panic as recorded in syslog:

Oct 21 15:11:22 kvm kernel: INFO: task spl_dynamic_tas:684 blocked for more than 120 seconds.
Oct 21 15:11:22 kvm kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 21 15:11:22 kvm kernel: spl_dynamic_tas D ffff94514a1c9080     0   684      2 0x00000000
Oct 21 15:11:22 kvm kernel: Call Trace:
Oct 21 15:11:22 kvm kernel: [<ffffffff8ab891e9>] schedule+0x29/0x70
Oct 21 15:11:22 kvm kernel: [<ffffffff8ab86eb1>] schedule_timeout+0x221/0x2d0
Oct 21 15:11:22 kvm kernel: [<ffffffff8a4d73f2>] ? check_preempt_curr+0x92/0xa0
Oct 21 15:11:22 kvm kernel: [<ffffffff8a4d7419>] ? ttwu_do_wakeup+0x19/0xe0
Oct 21 15:11:22 kvm kernel: [<ffffffff8ab8959d>] wait_for_completion+0xfd/0x140
Oct 21 15:11:22 kvm kernel: [<ffffffff8a4dadc0>] ? wake_up_state+0x20/0x20
Oct 21 15:11:22 kvm kernel: [<ffffffffc0aa2230>] ? taskq_thread_spawn+0x60/0x60 [spl]
Oct 21 15:11:22 kvm kernel: [<ffffffff8a4c5c8a>] kthread_create_on_node+0xaa/0x140
Oct 21 15:11:22 kvm kernel: [<ffffffff8a79229b>] ? string.isra.7+0x3b/0xf0
Oct 21 15:11:22 kvm kernel: [<ffffffffc0aa2230>] ? taskq_thread_spawn+0x60/0x60 [spl]
Oct 21 15:11:22 kvm kernel: [<ffffffffc0aa2230>] ? taskq_thread_spawn+0x60/0x60 [spl]
Oct 21 15:11:22 kvm kernel: [<ffffffffc0aa391c>] spl_kthread_create+0x9c/0xf0 [spl]
Oct 21 15:11:22 kvm kernel: [<ffffffffc0aa30bb>] taskq_thread_create+0x6b/0x110 [spl]
Oct 21 15:11:22 kvm kernel: [<ffffffffc0aa3172>] taskq_thread_spawn_task+0x12/0x40 [spl]
Oct 21 15:11:22 kvm kernel: [<ffffffffc0aa24f6>] taskq_thread+0x2c6/0x520 [spl]
Oct 21 15:11:22 kvm kernel: [<ffffffff8a4dadc0>] ? wake_up_state+0x20/0x20
Oct 21 15:11:22 kvm kernel: [<ffffffffc0aa2230>] ? taskq_thread_spawn+0x60/0x60 [spl]
Oct 21 15:11:22 kvm kernel: [<ffffffff8a4c5e31>] kthread+0xd1/0xe0
Oct 21 15:11:22 kvm kernel: [<ffffffff8a4c5d60>] ? insert_kthread_work+0x40/0x40
Oct 21 15:11:22 kvm kernel: [<ffffffff8ab95ddd>] ret_from_fork_nospec_begin+0x7/0x21
Oct 21 15:11:22 kvm kernel: [<ffffffff8a4c5d60>] ? insert_kthread_work+0x40/0x40

All running VMs were stalled, not responding to pings. The host itself was stalled with regard to disk I/O (it was impossible to login via SSH even if the root partition is not on ZFS), but network I/O was working (a SSH tunnel to another machine could be established). The host had >8GB free memory, but considerable swap (~8GB) was used and I tried to investigate. It looked as if the nightly backup to an NFS host was pushing used memory into swap area due to memory pressure caused by ARC (with target at 98%, ~15GB).

I swapoff -a to pagein all swapped memory and this left the system with a reduced ARC (~9 GB) and ~4GB free mem, then I set vm.swappiness=0 and read a big file from the ZFS dataset. At this point the machine slowly froze to a complete halt (livelocked?). I had to reboot it via IPMI.

As a side note, consider that in the previous days I was unable to start a block tracing debug (via blktrace) due to a very fragmented memory layout:

Oct 18 22:39:35 kvm kernel: blktrace: page allocation failure: order:4, mode:0xc0d0
Oct 18 22:39:35 kvm kernel: CPU: 3 PID: 11719 Comm: blktrace Kdump: loaded Tainted: P           OE  ------------   3.10.0-1160.31.1.el7.x86_64 #1
Oct 18 22:39:35 kvm kernel: Hardware name: Supermicro SYS-5039A-IL/X11SAE, BIOS 2.3 06/21/2018
Oct 18 22:39:35 kvm kernel: Call Trace:
Oct 18 22:39:35 kvm kernel: [<ffffffff8ab835a9>] dump_stack+0x19/0x1b
Oct 18 22:39:35 kvm kernel: [<ffffffff8a5c46c0>] warn_alloc_failed+0x110/0x180
Oct 18 22:39:35 kvm kernel: [<ffffffff8a5c925f>] __alloc_pages_nodemask+0x9df/0xbe0
Oct 18 22:39:35 kvm kernel: [<ffffffff8a618ea8>] alloc_pages_current+0x98/0x110
Oct 18 22:39:35 kvm kernel: [<ffffffff8a5e5ad8>] kmalloc_order+0x18/0x40
Oct 18 22:39:35 kvm kernel: [<ffffffff8a624876>] kmalloc_order_trace+0x26/0xa0
Oct 18 22:39:35 kvm kernel: [<ffffffff8a55d6e3>] relay_open+0x63/0x2c0
Oct 18 22:39:35 kvm kernel: [<ffffffff8a57cc2e>] do_blk_trace_setup+0x18e/0x2e0
Oct 18 22:39:35 kvm kernel: [<ffffffff8a57cf4f>] __blk_trace_setup+0x6f/0xe0
Oct 18 22:39:35 kvm kernel: [<ffffffff8a57e2c4>] blk_trace_ioctl+0xe4/0x160
Oct 18 22:39:35 kvm kernel: [<ffffffff8a768393>] blkdev_ioctl+0x533/0xa20
Oct 18 22:39:35 kvm kernel: [<ffffffff8a68ec41>] block_ioctl+0x41/0x50
Oct 18 22:39:35 kvm kernel: [<ffffffff8a6635c0>] do_vfs_ioctl+0x3a0/0x5b0
Oct 18 22:39:35 kvm kernel: [<ffffffff8a64ac2a>] ? __check_object_size+0x1ca/0x250
Oct 18 22:39:35 kvm kernel: [<ffffffff8a663871>] SyS_ioctl+0xa1/0xc0
Oct 18 22:39:35 kvm kernel: [<ffffffff8ab95f92>] system_call_fastpath+0x25/0x2a
Oct 18 22:39:35 kvm kernel: Mem-Info:
Oct 18 22:39:35 kvm kernel: active_anon:1423968 inactive_anon:273337 isolated_anon:0#012 active_file:1875 inactive_file:3035 isolated_file:0#012 unevictable:97 dirty:0 writeback:0 unstable:0#012 slab_reclaimable:24695 slab_unreclaimable:209997#012 mapped:2039 shmem:2659 pagetables:8555 bounce:0#012 free:2091832 free_pcp:0 free_cma:0
Oct 18 22:39:35 kvm kernel: Node 0 DMA free:15892kB min:32kB low:40kB high:48kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15892kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Oct 18 22:39:35 kvm kernel: lowmem_reserve[]: 0 1977 31885 31885
Oct 18 22:39:35 kvm kernel: Node 0 DMA32 free:122612kB min:4188kB low:5232kB high:6280kB active_anon:7312kB inactive_anon:7632kB active_file:0kB inactive_file:0kB unevictable:388kB isolated(anon):0kB isolated(file):0kB present:2256136kB managed:2024468kB mlocked:0kB dirty:0kB writeback:0kB mapped:16kB shmem:388kB slab_reclaimable:3824kB slab_unreclaimable:32432kB kernel_stack:256kB pagetables:3332kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Oct 18 22:39:35 kvm kernel: lowmem_reserve[]: 0 0 29908 29908
Oct 18 22:39:35 kvm kernel: Node 0 Normal free:8228824kB min:63360kB low:79200kB high:95040kB active_anon:5688560kB inactive_anon:1085716kB active_file:7500kB inactive_file:12140kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:31170560kB managed:30628736kB mlocked:0kB dirty:0kB writeback:0kB mapped:8140kB shmem:10248kB slab_reclaimable:94956kB slab_unreclaimable:807556kB kernel_stack:4816kB pagetables:30888kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Oct 18 22:39:35 kvm kernel: lowmem_reserve[]: 0 0 0 0
Oct 18 22:39:35 kvm kernel: Node 0 DMA: 1*4kB (U) 2*8kB (U) 2*16kB (U) 1*32kB (U) 3*64kB (U) 2*128kB (U) 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15892kB
Oct 18 22:39:35 kvm kernel: Node 0 DMA32: 3117*4kB (UEM) 3040*8kB (UEM) 1794*16kB (UEM) 9*32kB (UEM) 192*64kB (UM) 334*128kB (UM) 7*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 122612kB
Oct 18 22:39:35 kvm kernel: Node 0 Normal: 772778*4kB (UEM) 591957*8kB (UEM) 25004*16kB (UEM) 78*32kB (UEM) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 8229328kB
Oct 18 22:39:35 kvm kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Oct 18 22:39:35 kvm kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Oct 18 22:39:35 kvm kernel: 502660 total pagecache pages
Oct 18 22:39:35 kvm kernel: 495049 pages in swap cache
Oct 18 22:39:35 kvm kernel: Swap cache stats: add 1209489865, delete 1208826352, find 1779792608/2553325301
Oct 18 22:39:35 kvm kernel: Free swap  = 8448360kB
Oct 18 22:39:35 kvm kernel: Total swap = 16777212kB
Oct 18 22:39:35 kvm kernel: 8360671 pages RAM
Oct 18 22:39:35 kvm kernel: 0 pages HighMem/MovableOnly
Oct 18 22:39:35 kvm kernel: 193397 pages reserved
stale[bot] commented 1 year ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

yazun commented 1 year ago

Similar problem, 3x SAS3 15TB SSDs in raidz, stalling on multiple physical machines (5 out of 12) roughly same time with a similar load, Centos 7.9. Possible when memory is getting to the limit (but not exceeding - 230GB on 256GB RAM) OS stops responding for hours, reboot is needed.

Jan  5 02:46:13 gaiadb06 kernel: INFO: task spl_dynamic_tas:967 blocked for more than 120 seconds.
Jan  5 02:46:13 gaiadb06 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  5 02:46:13 gaiadb06 kernel: spl_dynamic_tas D ffff99943fec3150     0   967      2 0x00000000
Jan  5 02:46:13 gaiadb06 kernel: Call Trace:
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffb28e0008>] ? __enqueue_entity+0x78/0x80
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffb28e6baf>] ? enqueue_entity+0x2ef/0xbe0
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffb2f80a09>] schedule+0x29/0x70
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffb2f7e511>] schedule_timeout+0x221/0x2d0
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffb296e8ad>] ? tracing_record_cmdline+0x1d/0x120
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffb297701b>] ? probe_sched_wakeup+0x2b/0xa0
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffb28d7845>] ? ttwu_do_wakeup+0xb5/0xe0
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffb2f80dbd>] wait_for_completion+0xfd/0x140
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffb28db4c0>] ? wake_up_state+0x20/0x20
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffc0b1ee80>] ? taskq_thread_spawn+0x60/0x60 [spl]
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffb28c604a>] kthread_create_on_node+0xaa/0x140
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffb2b8d3fb>] ? string.isra.7+0x3b/0xf0
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffc0b1ee80>] ? taskq_thread_spawn+0x60/0x60 [spl]
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffc0b1ee80>] ? taskq_thread_spawn+0x60/0x60 [spl]
Jan  5 02:46:13 gaiadb06 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  5 02:46:13 gaiadb06 kernel: spl_dynamic_tas D ffff99943fec3150     0   967      2 0x00000000
Jan  5 02:46:13 gaiadb06 kernel: Call Trace:
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffb28e0008>] ? __enqueue_entity+0x78/0x80
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffb28e6baf>] ? enqueue_entity+0x2ef/0xbe0
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffb2f80a09>] schedule+0x29/0x70
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffb2f7e511>] schedule_timeout+0x221/0x2d0
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffb296e8ad>] ? tracing_record_cmdline+0x1d/0x120
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffb297701b>] ? probe_sched_wakeup+0x2b/0xa0
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffb28d7845>] ? ttwu_do_wakeup+0xb5/0xe0
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffb2f80dbd>] wait_for_completion+0xfd/0x140
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffb28db4c0>] ? wake_up_state+0x20/0x20
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffc0b1ee80>] ? taskq_thread_spawn+0x60/0x60 [spl]
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffb28c604a>] kthread_create_on_node+0xaa/0x140
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffb2b8d3fb>] ? string.isra.7+0x3b/0xf0
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffc0b1ee80>] ? taskq_thread_spawn+0x60/0x60 [spl]
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffc0b1ee80>] ? taskq_thread_spawn+0x60/0x60 [spl]
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffc0b2065c>] spl_kthread_create+0x9c/0xf0 [spl]
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffc0b1fd0b>] taskq_thread_create+0x6b/0x110 [spl]
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffc0b1fdc2>] taskq_thread_spawn_task+0x12/0x40 [spl]
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffc0b2065c>] spl_kthread_create+0x9c/0xf0 [spl]
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffc0b1fd0b>] taskq_thread_create+0x6b/0x110 [spl]
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffc0b1fdc2>] taskq_thread_spawn_task+0x12/0x40 [spl]
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffc0b1f146>] taskq_thread+0x2c6/0x520 [spl]
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffc0b1f146>] taskq_thread+0x2c6/0x520 [spl]
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffb28db4c0>] ? wake_up_state+0x20/0x20
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffb28db4c0>] ? wake_up_state+0x20/0x20
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffc0b1ee80>] ? taskq_thread_spawn+0x60/0x60 [spl]
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffc0b1ee80>] ? taskq_thread_spawn+0x60/0x60 [spl]
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffb28c61f1>] kthread+0xd1/0xe0
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffb28c61f1>] kthread+0xd1/0xe0
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffb28c6120>] ? insert_kthread_work+0x40/0x40
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffb28c6120>] ? insert_kthread_work+0x40/0x40
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffb2f8dd37>] ret_from_fork_nospec_begin+0x21/0x21
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffb2f8dd37>] ret_from_fork_nospec_begin+0x21/0x21
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffb28c6120>] ? insert_kthread_work+0x40/0x40
Jan  5 02:46:13 gaiadb06 kernel: [<ffffffffb28c6120>] ? insert_kthread_work+0x40/0x40