sophgo / linux-riscv

Linux kernel stable tree
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/
Other
23 stars 47 forks source link

CPU lockup on high load #92

Open felixonmars opened 8 months ago

felixonmars commented 8 months ago

I am getting another 2-way server crash with 6.1.61 kernel compiled myself at commit db74e759247f. The server was having high load (~500) at the moment.

[44855.988161] INFO: task iou-sqp-533883:533890 blocked for more than 123 seconds.
[44855.995583]       Not tainted 6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1
[44856.002700] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[44856.010596] task:iou-sqp-533883  state:D stack:0     pid:533890 ppid:533873 flags:0x00000100
[44856.019108] Call Trace:
[44856.021589] [<ffffffff809f5c36>] __schedule+0x34e/0x10b0
[44856.026967] [<ffffffff809f69e6>] schedule+0x4e/0xd0
[44856.031898] [<ffffffff8001f8c0>] do_exit+0xe6/0x922
[44856.036836] [<ffffffff80470426>] io_sq_thread_unpark+0x0/0x52
[44856.042643] [<ffffffff80003f18>] ret_from_exception+0x0/0x16
[44856.050530] INFO: task node:533891 blocked for more than 123 seconds.
[44856.058874]       Not tainted 6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1
[44856.067643] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[44856.077570] task:node            state:D stack:0     pid:533891 ppid:533873 flags:0x00000100
[44856.088033] Call Trace:
[44856.092551] [<ffffffff809f5c36>] __schedule+0x34e/0x10b0
[44856.099618] [<ffffffff809f69e6>] schedule+0x4e/0xd0
[44856.106508] [<ffffffff8001f8c0>] do_exit+0xe6/0x922
[44856.112896] [<ffffffff800202ac>] do_group_exit+0x34/0x84
[44856.120305] [<ffffffff8002e6ec>] get_signal+0x950/0x97c
[44856.127454] [<ffffffff80005b04>] do_work_pending+0x11a/0x514
[44856.135004] [<ffffffff80003f9e>] resume_userspace_slow+0xc/0xe
[44856.144859] INFO: task iou-sqp-533891:533897 blocked for more than 123 seconds.
[44856.158078]       Not tainted 6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1
[44856.171042] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[44856.184319] task:iou-sqp-533891  state:D stack:0     pid:533897 ppid:533873 flags:0x00000100
[44856.198653] Call Trace:
[44856.206482] [<ffffffff809f5c36>] __schedule+0x34e/0x10b0
[44856.217475] [<ffffffff809f69e6>] schedule+0x4e/0xd0
[44856.227726] [<ffffffff8001f8c0>] do_exit+0xe6/0x922
[44856.237960] [<ffffffff80470426>] io_sq_thread_unpark+0x0/0x52
[44856.248980] [<ffffffff80003f18>] ret_from_exception+0x0/0x16
[44856.259576] INFO: task node:533898 blocked for more than 123 seconds.
[44856.271175]       Not tainted 6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1
[44856.283175] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[44856.296057] task:node            state:D stack:0     pid:533898 ppid:533873 flags:0x00000100
[44856.309608] Call Trace:
[44856.317249] [<ffffffff809f5c36>] __schedule+0x34e/0x10b0
[44856.327718] [<ffffffff809f69e6>] schedule+0x4e/0xd0
[44856.337731] [<ffffffff8001f8c0>] do_exit+0xe6/0x922
[44856.347760] [<ffffffff800202ac>] do_group_exit+0x34/0x84
[44856.357977] [<ffffffff8002e6ec>] get_signal+0x950/0x97c
[44856.367954] [<ffffffff80005b04>] do_work_pending+0x11a/0x514
[44856.378642] [<ffffffff80003f9e>] resume_userspace_slow+0xc/0xe
[44856.389253] INFO: task node:533900 blocked for more than 124 seconds.
[44856.400484]       Not tainted 6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1
[44856.412222] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[44856.424653] task:node            state:D stack:0     pid:533900 ppid:533873 flags:0x00000100
[44856.438093] Call Trace:
[44856.445409] [<ffffffff809f5c36>] __schedule+0x34e/0x10b0
[44856.455309] [<ffffffff809f69e6>] schedule+0x4e/0xd0
[44856.464535] [<ffffffff8001f8c0>] do_exit+0xe6/0x922
[44856.473732] [<ffffffff800202ac>] do_group_exit+0x34/0x84
[44856.483670] [<ffffffff8002e6ec>] get_signal+0x950/0x97c
[44856.493605] [<ffffffff80005b04>] do_work_pending+0x11a/0x514
[44856.503825] [<ffffffff80003f9e>] resume_userspace_slow+0xc/0xe
[44856.514267] INFO: task node:533901 blocked for more than 124 seconds.
[44856.525292]       Not tainted 6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1
[44856.537294] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[44856.550119] task:node            state:D stack:0     pid:533901 ppid:533873 flags:0x00000100
[44856.563940] Call Trace:
[44856.571408] [<ffffffff809f5c36>] __schedule+0x34e/0x10b0
[44856.582214] [<ffffffff809f69e6>] schedule+0x4e/0xd0
[44856.592102] [<ffffffff8001f8c0>] do_exit+0xe6/0x922
[44856.601798] [<ffffffff800202ac>] do_group_exit+0x34/0x84
[44856.612306] [<ffffffff8002e6ec>] get_signal+0x950/0x97c
[44856.622423] [<ffffffff80005b04>] do_work_pending+0x11a/0x514
[44856.633127] [<ffffffff80003f9e>] resume_userspace_slow+0xc/0xe
[44856.648942] INFO: task node:533902 blocked for more than 124 seconds.
[44856.660497]       Not tainted 6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1
[44856.672258] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[44856.685118] task:node            state:D stack:0     pid:533902 ppid:533873 flags:0x00000100
[44856.698754] Call Trace:
[44856.706235] [<ffffffff809f5c36>] __schedule+0x34e/0x10b0
[44856.716664] [<ffffffff809f69e6>] schedule+0x4e/0xd0
[44856.726356] [<ffffffff8001f8c0>] do_exit+0xe6/0x922
[44856.736209] [<ffffffff800202ac>] do_group_exit+0x34/0x84
[44856.746764] [<ffffffff8002e6ec>] get_signal+0x950/0x97c
[44856.756868] [<ffffffff80005b04>] do_work_pending+0x11a/0x514
[44856.767551] [<ffffffff80003f9e>] resume_userspace_slow+0xc/0xe
[44856.778183] INFO: task iou-sqp-533883:534374 blocked for more than 124 seconds.
[44856.790332]       Not tainted 6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1
[44856.802514] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[44856.815526] task:iou-sqp-533883  state:D stack:0     pid:534374 ppid:533873 flags:0x00000100
[44856.829018] Call Trace:
[44856.836263] [<ffffffff809f5c36>] __schedule+0x34e/0x10b0
[44856.846499] [<ffffffff809f69e6>] schedule+0x4e/0xd0
[44856.856378] [<ffffffff8001f8c0>] do_exit+0xe6/0x922
[44856.866038] [<ffffffff80470426>] io_sq_thread_unpark+0x0/0x52
[44856.876731] [<ffffffff80003f18>] ret_from_exception+0x0/0x16
[44856.887184] INFO: task node:534399 blocked for more than 124 seconds.
[44856.898487]       Not tainted 6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1
[44856.910490] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[44856.923698] task:node            state:D stack:0     pid:534399 ppid:533873 flags:0x00000100
[44856.937177] Call Trace:
[44856.944825] [<ffffffff809f5c36>] __schedule+0x34e/0x10b0
[44856.955026] [<ffffffff809f69e6>] schedule+0x4e/0xd0
[44856.964958] [<ffffffff8001f8c0>] do_exit+0xe6/0x922
[44856.974861] [<ffffffff800202ac>] do_group_exit+0x34/0x84
[44856.985292] [<ffffffff8002e6ec>] get_signal+0x950/0x97c
[44856.995399] [<ffffffff80005b04>] do_work_pending+0x11a/0x514
[44857.005638] [<ffffffff80003f9e>] resume_userspace_slow+0xc/0xe
[44857.016827] INFO: task iou-wrk-534374:534651 blocked for more than 124 seconds.
[44857.029491]       Not tainted 6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1
[44857.041451] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[44857.054436] task:iou-wrk-534374  state:D stack:0     pid:534651 ppid:533873 flags:0x00000100
[44857.067871] Call Trace:
[44857.075487] [<ffffffff809f5c36>] __schedule+0x34e/0x10b0
[44857.086155] [<ffffffff809f69e6>] schedule+0x4e/0xd0
[44857.096097] [<ffffffff8001f8c0>] do_exit+0xe6/0x922
[44857.105750] [<ffffffff804799aa>] io_wqe_worker+0x316/0x360
[44857.116339] [<ffffffff80003f18>] ret_from_exception+0x0/0x16
[45308.473875] watchdog: BUG: soft lockup - CPU#9 stuck for 22s! [migration/9:72]
[45308.482500] Modules linked in: sctp ip6_udp_tunnel udp_tunnel joydev tun cfg80211 rfkill xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c xt_TCPMSS xt_tcpudp iptable_filter vfat fat ixgbe ofpart mdio_devres ipmi_si sophgo_spifmc of_mdio spi_nor fixed_phy ipmi_devintf fwnode_mdio igb libphy 8250_dw ipmi_msghandler mtd gpio_dwapb mousedev switchtec mdio uio_pdrv_genirq uio tcp_bbr sch_fq fuse dm_mod loop nfnetlink bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 mmc_block usbhid sdhci_sophgo ast sdhci_pltfm sdhci drm_vram_helper nvme spi_dw_mmio drm_ttm_helper mmc_core nvme_core gpio_keys spi_dw xhci_pci ttm xhci_pci_renesas nvme_common
[45308.503921] watchdog: BUG: soft lockup - CPU#41 stuck for 22s! [migration/41:266]
[45308.553830] CPU: 9 PID: 72 Comm: migration/9 Not tainted 6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1 d71360b18e23f727e028cd5c570bc8e9bdca43d0
[45308.555467] Modules linked in:
[45308.558366] Hardware name: Sophgo Mango (DT)
[45308.558372] Stopper: multi_cpu_stop+0x0/0x172 <- migrate_swap+0xbe/0x158
[45308.565850]  sctp
[45308.567536] epc : rcu_momentary_dyntick_idle+0x3a/0x80
[45308.568959]  ip6_udp_tunnel
[45308.570425]  ra : multi_cpu_stop+0xb8/0x172
[45308.571994]  udp_tunnel
[45308.573618] epc : ffffffff8009ab24 ra : ffffffff800ef2cc sp : ffffffc80a96bd90
[45308.575680]  joydev
[45308.577162]  gp : ffffffff81bac808 tp : ffffffdffeb61f80 t0 : 0000000000000080
[45308.578526]  tun
[45308.580220]  t1 : 0000000001806000 t2 : 0000000000000000 s0 : ffffffc80a96be10
[45308.581513]  cfg80211
[45308.583208]  s1 : ffffffc84017b9f0 a0 : ffffffff80e0dc60 a1 : 0000000000000002
[45308.584760]  rfkill
[45308.586244]  a2 : ffffffc84017ba18 a3 : ffffffff81c089c8 a4 : 000000004c6f3bd4
[45308.587851]  xt_MASQUERADE
[45308.589301]  a5 : fffffff5db0d3cc0 a6 : 0000000000000001 a7 : 0000000000000000
[45308.589306]  s2 : ffffffc84017ba14 s3 : ffffffffffffffff s4 : ffffffff80e0dc60
[45308.589310]  s5 : 0000000000000001 s6 : 0000000000000000 s7 : 0000000000000002
[45308.589314]  s8 : 0000000000000004 s9 : 0000000000000001 s10: 0000000000000003
[45308.589318]  s11: 0000000000000001 t3 : 0000000000000000 t4 : 000000000000b60e
[45308.591471]  iptable_nat
[45308.593270]  t5 : 000000fc00000000 t6 : 0000000000000001
[45308.595003]  nf_nat
[45308.596607] status: 0000000200000120 badaddr: 0000000000000000 cause: 8000000000000001
[45308.598327]  nf_conntrack
[45308.599910] [<ffffffff8009ab24>] rcu_momentary_dyntick_idle+0x3a/0x80
[45308.601492]  nf_defrag_ipv6
[45308.602925] [<ffffffff800eee3a>] cpu_stopper_thread+0xfc/0x182
[45308.604631]  nf_defrag_ipv4
[45308.606231] [<ffffffff80044e76>] smpboot_thread_fn+0xe6/0x11a
[45308.607673]  libcrc32c
[45308.609194] [<ffffffff80040aee>] kthread+0xbe/0xd4
[45308.610837]  xt_TCPMSS
[45308.612224] [<ffffffff80003f18>] ret_from_exception+0x0/0x16
[45308.613836]  xt_tcpudp iptable_filter vfat fat ixgbe ofpart mdio_devres ipmi_si sophgo_spifmc of_mdio spi_nor fixed_phy ipmi_devintf fwnode_mdio igb libphy 8250_dw ipmi_msghandler mtd gpio_dwapb mousedev switchtec mdio uio_pdrv_genirq uio tcp_bbr sch_fq fuse dm_mod loop nfnetlink bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 mmc_block usbhid sdhci_sophgo ast sdhci_pltfm sdhci drm_vram_helper nvme spi_dw_mmio drm_ttm_helper mmc_core nvme_core gpio_keys spi_dw xhci_pci ttm xhci_pci_renesas nvme_common
[45308.865495] CPU: 41 PID: 266 Comm: migration/41 Tainted: G             L     6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1 d71360b18e23f727e028cd5c570bc8e9bdca43d0
[45308.883578] Hardware name: Sophgo Mango (DT)
[45308.889560] Stopper: multi_cpu_stop+0x0/0x172 <- migrate_swap+0xbe/0x158
[45308.897843] epc : rcu_momentary_dyntick_idle+0x3a/0x80
[45308.904524]  ra : multi_cpu_stop+0xb8/0x172
[45308.910231] epc : ffffffff8009ab24 ra : ffffffff800ef2cc sp : ffffffc80af7bd90
[45308.919232]  gp : ffffffff81bac808 tp : ffffffeffdf23f00 t0 : 0000000000000080
[45308.927724]  t1 : 0000000001806000 t2 : 0000000000000001 s0 : ffffffc80af7be10
[45308.936569]  s1 : ffffffc83f8f39f0 a0 : ffffffff80e0db30 a1 : 0000000000000002
[45308.945430]  a2 : ffffffc83f8f3a18 a3 : ffffffff81c089c8 a4 : 000000001de9706c
[45308.954400]  a5 : fffffff5db473cc0 a6 : 0000000000000001 a7 : 0000000000000000
[45308.963118]  s2 : ffffffc83f8f3a14 s3 : ffffffffffffffff s4 : ffffffff80e0db30
[45308.971816]  s5 : 0000000000000001 s6 : 0000000000000000 s7 : 0000000000000002
[45308.980622]  s8 : 0000000000000004 s9 : 0000000000000001 s10: 0000000000000003
[45308.989427]  s11: 0000000000000001 t3 : 0000000000000000 t4 : 0000000000000007
[45308.998182]  t5 : 0000000000000005 t6 : 000000000000ffff
[45309.005003] status: 0000000200000120 badaddr: 0000000000000000 cause: 8000000000000001
[45309.014689] [<ffffffff8009ab24>] rcu_momentary_dyntick_idle+0x3a/0x80
[45309.022893] [<ffffffff800eee3a>] cpu_stopper_thread+0xfc/0x182
[45309.030131] [<ffffffff80044e76>] smpboot_thread_fn+0xe6/0x11a
[45309.037489] [<ffffffff80040aee>] kthread+0xbe/0xd4
[45309.043803] [<ffffffff80003f18>] ret_from_exception+0x0/0x16
[45316.540733] watchdog: BUG: soft lockup - CPU#71 stuck for 22s! [migration/71:447]
[45316.552469] Modules linked in: sctp ip6_udp_tunnel udp_tunnel joydev tun cfg80211 rfkill xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c xt_TCPMSS xt_tcpudp iptable_filter vfat fat ixgbe ofpart mdio_devres ipmi_si sophgo_spifmc of_mdio spi_nor fixed_phy ipmi_devintf fwnode_mdio igb libphy 8250_dw ipmi_msghandler mtd gpio_dwapb mousedev switchtec mdio uio_pdrv_genirq uio tcp_bbr sch_fq fuse dm_mod loop nfnetlink bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 mmc_block usbhid sdhci_sophgo ast sdhci_pltfm sdhci drm_vram_helper nvme spi_dw_mmio drm_ttm_helper mmc_core nvme_core gpio_keys spi_dw xhci_pci ttm xhci_pci_renesas nvme_common
[45316.557529] watchdog: BUG: soft lockup - CPU#80 stuck for 23s! [migration/80:502]
[45316.560695] watchdog: BUG: soft lockup - CPU#83 stuck for 22s! [migration/83:520]
[45316.560735] Modules linked in: sctp ip6_udp_tunnel udp_tunnel joydev tun cfg80211 rfkill xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c xt_TCPMSS xt_tcpudp iptable_filter vfat fat ixgbe ofpart mdio_devres ipmi_si sophgo_spifmc of_mdio spi_nor fixed_phy ipmi_devintf fwnode_mdio igb libphy 8250_dw ipmi_msghandler mtd gpio_dwapb mousedev switchtec mdio uio_pdrv_genirq uio tcp_bbr sch_fq fuse dm_mod loop nfnetlink bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 mmc_block usbhid sdhci_sophgo ast sdhci_pltfm sdhci drm_vram_helper nvme spi_dw_mmio drm_ttm_helper mmc_core nvme_core gpio_keys spi_dw xhci_pci ttm xhci_pci_renesas nvme_common
[45316.561113] CPU: 83 PID: 520 Comm: migration/83 Tainted: G             L     6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1 d71360b18e23f727e028cd5c570bc8e9bdca43d0
[45316.561143] Hardware name: Sophgo Mango (DT)
[45316.561151] Stopper: multi_cpu_stop+0x0/0x172 <- migrate_swap+0xbe/0x158
[45316.561210] epc : rcu_momentary_dyntick_idle+0x3a/0x80
[45316.561226]  ra : multi_cpu_stop+0xb8/0x172
[45316.561236] epc : ffffffff8009ab24 ra : ffffffff800ef2cc sp : ffffffc80b76bd90
[45316.561246]  gp : ffffffff81bac808 tp : ffffffd8ff795e80 t0 : 0000000000000080
[45316.561253]  t1 : 0000000001806000 t2 : 0000000000000000 s0 : ffffffc80b76be10
[45316.561261]  s1 : ffffffc8352fb9f0 a0 : ffffffff80e0da90 a1 : 0000000000000002
[45316.561267]  a2 : ffffffc8352fba18 a3 : ffffffff81c089c8 a4 : ffffffffb61fbc6c
[45316.561276]  a5 : fffffff5db935cc0 a6 : 0000000000000001 a7 : 0000000000000000
[45316.561283]  s2 : ffffffc8352fba14 s3 : ffffffffffffffff s4 : ffffffff80e0da90
[45316.561289]  s5 : 0000000000000001 s6 : 0000000000000000 s7 : 0000000000000002
[45316.561294]  s8 : 0000000000000004 s9 : 0000000000000001 s10: 0000000000000003
[45316.561300]  s11: 0000000000000001 t3 : 0000000000000000 t4 : 0000000000000005
[45316.561307]  t5 : 0000000000000002 t6 : 000000000000010a
[45316.561312] status: 0000000200000120 badaddr: 0000000000000000 cause: 8000000000000001
[45316.561321] [<ffffffff8009ab24>] rcu_momentary_dyntick_idle+0x3a/0x80
[45316.561334] [<ffffffff800eee3a>] cpu_stopper_thread+0xfc/0x182
[45316.561344] [<ffffffff80044e76>] smpboot_thread_fn+0xe6/0x11a
[45316.561359] [<ffffffff80040aee>] kthread+0xbe/0xd4
[45316.561368] [<ffffffff80003f18>] ret_from_exception+0x0/0x16
[45316.572736] CPU: 71 PID: 447 Comm: migration/71 Tainted: G             L     6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1 d71360b18e23f727e028cd5c570bc8e9bdca43d0
[45316.577483] Modules linked in:
[45316.581746] Hardware name: Sophgo Mango (DT)
[45316.603226]  sctp
[45316.611874] Stopper: multi_cpu_stop+0x0/0x172 <- migrate_swap+0xbe/0x158
[45316.616515]  ip6_udp_tunnel
[45316.621297] epc : rcu_momentary_dyntick_idle+0x3a/0x80
[45316.625749]  udp_tunnel
[45316.630454]  ra : multi_cpu_stop+0xb8/0x172
[45316.634929]  joydev
[45316.639520] epc : ffffffff8009ab24 ra : ffffffff800ef2cc sp : ffffffc80b523d90
[45316.644036]  tun
[45316.647471] watchdog: BUG: soft lockup - CPU#125 stuck for 23s! [migration/125:775]
[45316.647580] Modules linked in: sctp ip6_udp_tunnel udp_tunnel joydev tun cfg80211 rfkill xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c xt_TCPMSS xt_tcpudp iptable_filter vfat fat ixgbe ofpart mdio_devres ipmi_si sophgo_spifmc of_mdio spi_nor fixed_phy ipmi_devintf fwnode_mdio igb libphy 8250_dw ipmi_msghandler mtd gpio_dwapb mousedev switchtec mdio uio_pdrv_genirq uio tcp_bbr sch_fq fuse dm_mod loop nfnetlink bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 mmc_block usbhid sdhci_sophgo ast sdhci_pltfm sdhci drm_vram_helper nvme spi_dw_mmio drm_ttm_helper mmc_core nvme_core gpio_keys spi_dw xhci_pci ttm xhci_pci_renesas nvme_common
[45316.648203] CPU: 125 PID: 775 Comm: migration/125 Tainted: G             L     6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1 d71360b18e23f727e028cd5c570bc8e9bdca43d0
[45316.648254] Hardware name: Sophgo Mango (DT)
[45316.648266] Stopper: multi_cpu_stop+0x0/0x172 <- migrate_swap+0xbe/0x158
[45316.648351] epc : rcu_momentary_dyntick_idle+0x3a/0x80
[45316.648380]  ra : multi_cpu_stop+0xb8/0x172
[45316.648398] epc : ffffffff8009ab24 ra : ffffffff800ef2cc sp : ffffffc80bf63d90
[45316.648414]  gp : ffffffff81bac808 tp : ffffffd8ffabbf00 t0 : 0000000000000080
[45316.648427]  t1 : 0000000001806000 t2 : 0000000000000002 s0 : ffffffc80bf63e10
[45316.648441]  s1 : ffffffc838d2b9f0 a0 : ffffffff80e0d9b0 a1 : 0000000000000002
[45316.648454]  a2 : ffffffc838d2ba48 a3 : ffffffff81c089c8 a4 : ffffffffb1acaa24
[45316.648461]  a5 : fffffff5dbdf7cc0 a6 : 0000000000000001 a7 : 0000000000000000
[45316.648472]  s2 : ffffffc838d2ba14 s3 : ffffffffffffffff s4 : ffffffff80e0d9b0
[45316.648483]  s5 : 0000000000000001 s6 : 0000000000000000 s7 : 0000000000000002
[45316.648491]  s8 : 0000000000000004 s9 : 0000000000000001 s10: 0000000000000003
[45316.648498]  s11: 0000000000000001 t3 : 0000000000000000 t4 : 000000000000b564
[45316.648510]  t5 : fffffff1d632780c t6 : ffffffc838d2bdb8
[45316.648519] status: 0000000200000120 badaddr: 0000000000000000 cause: 8000000000000001
[45316.648535] [<ffffffff8009ab24>] rcu_momentary_dyntick_idle+0x3a/0x80
[45316.648558] [<ffffffff800eee3a>] cpu_stopper_thread+0xfc/0x182
[45316.648568] [<ffffffff80044e76>] smpboot_thread_fn+0xe6/0x11a
[45316.648585]  gp : ffffffff81bac808 tp : ffffffd8ff693f00 t0 : 0000000000000080
[45316.648598] [<ffffffff80040aee>] kthread+0xbe/0xd4
[45316.648627] [<ffffffff80003f18>] ret_from_exception+0x0/0x16
[45316.652992]  cfg80211
[45316.657396]  t1 : 0000000001806000 t2 : 0000000000000001 s0 : ffffffc80b523e10
[45316.661723]  rfkill
[45316.666339]  s1 : ffffffc83e91b9f0 a0 : ffffffff80e0d9e0 a1 : 0000000000000002
[45316.670669]  xt_MASQUERADE
[45316.675093]  a2 : ffffffc83e91ba18 a3 : ffffffff81c089c8 a4 : ffffffffb8305fc4
[45316.679261]  iptable_nat
[45316.683518]  a5 : fffffff5db7d9cc0 a6 : 0000000000000001 a7 : 0000000000000000
[45316.687708]  nf_nat
[45316.691822]  s2 : ffffffc83e91ba14 s3 : ffffffffffffffff s4 : ffffffff80e0d9e0
[45316.696041]  nf_conntrack
[45316.699993]  s5 : 0000000000000001 s6 : 0000000000000000 s7 : 0000000000000002
[45316.703891]  nf_defrag_ipv6
[45316.711012]  s8 : 0000000000000004 s9 : 0000000000000001 s10: 0000000000000003
[45316.714915]  nf_defrag_ipv4
[45316.718724]  s11: 0000000000000001 t3 : 0000000000000000 t4 : 000000000000b240
[45316.722654]  libcrc32c
[45316.726589]  t5 : 0000038e00000000 t6 : 0000000000000001
[45316.730439]  xt_TCPMSS
[45316.734514] status: 0000000200000120 badaddr: 0000000000000000 cause: 8000000000000001
[45316.738378]  xt_tcpudp
[45316.742344] [<ffffffff8009ab24>] rcu_momentary_dyntick_idle+0x3a/0x80
[45316.746064]  iptable_filter
[45316.749879] [<ffffffff800eee3a>] cpu_stopper_thread+0xfc/0x182
[45316.753627]  vfat
[45316.757549] [<ffffffff80044e76>] smpboot_thread_fn+0xe6/0x11a
[45316.775352]  fat
[45316.782228] [<ffffffff80040aee>] kthread+0xbe/0xd4
[45316.785847]  ixgbe
[45316.789547] [<ffffffff80003f18>] ret_from_exception+0x0/0x16
[45316.793204]  ofpart mdio_devres ipmi_si sophgo_spifmc of_mdio spi_nor fixed_phy ipmi_devintf fwnode_mdio igb libphy 8250_dw ipmi_msghandler mtd gpio_dwapb mousedev switchtec mdio uio_pdrv_genirq uio tcp_bbr sch_fq fuse dm_mod loop nfnetlink bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 mmc_block usbhid sdhci_sophgo ast sdhci_pltfm sdhci drm_vram_helper nvme spi_dw_mmio drm_ttm_helper mmc_core nvme_core gpio_keys spi_dw xhci_pci ttm xhci_pci_renesas nvme_common
[45317.691773] CPU: 80 PID: 502 Comm: migration/80 Tainted: G             L     6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1 d71360b18e23f727e028cd5c570bc8e9bdca43d0
[45317.712214] Hardware name: Sophgo Mango (DT)
[45317.719354] Stopper: multi_cpu_stop+0x0/0x172 <- migrate_swap+0xbe/0x158
[45317.729201] epc : rcu_momentary_dyntick_idle+0x3a/0x80
[45317.737345]  ra : multi_cpu_stop+0xb8/0x172
[45317.744281] epc : ffffffff8009ab24 ra : ffffffff800ef2cc sp : ffffffc80b6dbd90
[45317.754171]  gp : ffffffff81bac808 tp : ffffffd8ff76bf00 t0 : 0000000000000080
[45317.764055]  t1 : 0000000001806000 t2 : 0000002abb8226f9 s0 : ffffffc80b6dbe10
[45317.774179]  s1 : ffffffc838feb9f0 a0 : ffffffff80e0d990 a1 : 0000000000000002
[45317.784340]  a2 : ffffffc838feba48 a3 : ffffffff81c089c8 a4 : 000000000b6031d4
[45317.794436]  a5 : fffffff5db8decc0 a6 : 0000000000000001 a7 : 0000000000000000
[45317.804493]  s2 : ffffffc838feba14 s3 : ffffffffffffffff s4 : ffffffff80e0d990
[45317.814555]  s5 : 0000000000000001 s6 : 0000000000000000 s7 : 0000000000000002
[45317.824637]  s8 : 0000000000000004 s9 : 0000000000000001 s10: 0000000000000003
[45317.834724]  s11: 0000000000000001 t3 : 0000000000000000 t4 : 000000000000b424
[45317.844934]  t5 : 000001a200000000 t6 : ffffffc5fe6f8000
[45317.852945] status: 0000000200000120 badaddr: 0000000000000000 cause: 8000000000000001
[45317.863767] [<ffffffff8009ab24>] rcu_momentary_dyntick_idle+0x3a/0x80
[45317.873092] [<ffffffff800eee3a>] cpu_stopper_thread+0xfc/0x182
[45317.881867] [<ffffffff80044e76>] smpboot_thread_fn+0xe6/0x11a
[45317.890527] [<ffffffff80040aee>] kthread+0xbe/0xd4
[45317.898340] [<ffffffff80003f18>] ret_from_exception+0x0/0x16
felixonmars commented 8 months ago

Another server crash from Debian(RevyOS)'s 6.1.61 kernel, but I am not sure if it's the same issue:

[209173.109588] INFO: task node:896412 blocked for more than 124 seconds.
[209173.116301]       Not tainted 6.1.61-pisces #2023.12.19.12.48+c60b48221
[209173.123094] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[209173.131065] task:node            state:D stack:0     pid:896412 ppid:890296 flags:0x00000100
[209173.139659] Call Trace:
[209173.142239] [<ffffffff80a6e16a>] __schedule+0x29c/0x89e
[209173.147634] [<ffffffff80a6e7b8>] schedule+0x4c/0xce
[209173.152644] [<ffffffff80a6ebba>] schedule_preempt_disabled+0x18/0x20
[209173.159136] [<ffffffff80a6fb82>] __mutex_lock.constprop.0+0x336/0x6b8
[209173.165734] [<ffffffff80a70000>] __mutex_lock_slowpath+0x1a/0x22
[209173.171878] [<ffffffff80a7004a>] mutex_lock+0x42/0x4c
[209173.177061] [<ffffffff800d1a98>] proc_cgroup_show+0x5c/0x3ba
[209173.182866] [<ffffffff80306f7a>] proc_single_show+0x4e/0x9e
[209173.188594] [<ffffffff802a836a>] seq_read_iter+0x158/0x362
[209173.194216] [<ffffffff802a8608>] seq_read+0x94/0xc0
[209173.199231] [<ffffffff802797e2>] vfs_read+0xaa/0x238
[209173.204325] [<ffffffff8027a362>] ksys_read+0x6e/0xe4
[209173.209434] [<ffffffff8027a3f2>] sys_read+0x1a/0x22
[209173.214446] [<ffffffff80003cc8>] ret_from_syscall+0x0/0x2
[209173.229174] INFO: task node:896551 blocked for more than 124 seconds.
[209173.235882]       Not tainted 6.1.61-pisces #2023.12.19.12.48+c60b48221
[209173.242680] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[209173.250655] task:node            state:D stack:0     pid:896551 ppid:889336 flags:0x00000100
[209173.259257] Call Trace:
[209173.261829] [<ffffffff80a6e16a>] __schedule+0x29c/0x89e
[209173.267216] [<ffffffff80a6e7b8>] schedule+0x4c/0xce
[209173.272227] [<ffffffff80a6ebba>] schedule_preempt_disabled+0x18/0x20
[209173.278716] [<ffffffff80a6fb82>] __mutex_lock.constprop.0+0x336/0x6b8
[209173.285303] [<ffffffff80a70000>] __mutex_lock_slowpath+0x1a/0x22
[209173.291479] [<ffffffff80a7004a>] mutex_lock+0x42/0x4c
[209173.296665] [<ffffffff800d1a98>] proc_cgroup_show+0x5c/0x3ba
[209173.302464] [<ffffffff80306f7a>] proc_single_show+0x4e/0x9e
[209173.308205] [<ffffffff802a836a>] seq_read_iter+0x158/0x362
[209173.313822] [<ffffffff802a8608>] seq_read+0x94/0xc0
[209173.318828] [<ffffffff802797e2>] vfs_read+0xaa/0x238
[209173.323937] [<ffffffff8027a362>] ksys_read+0x6e/0xe4
[209173.329031] [<ffffffff8027a3f2>] sys_read+0x1a/0x22
[209173.334039] [<ffffffff80003cc8>] ret_from_syscall+0x0/0x2
[209173.348811] INFO: task node:896742 blocked for more than 124 seconds.
[209173.355521]       Not tainted 6.1.61-pisces #2023.12.19.12.48+c60b48221
[209173.362294] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[209173.370269] task:node            state:D stack:0     pid:896742 ppid:890159 flags:0x00000100
[209173.378867] Call Trace:
[209173.381442] [<ffffffff80a6e16a>] __schedule+0x29c/0x89e
[209173.389371] [<ffffffff80a6e7b8>] schedule+0x4c/0xce
[209173.396810] [<ffffffff80a6ebba>] schedule_preempt_disabled+0x18/0x20
[209173.405501] [<ffffffff80a6fb82>] __mutex_lock.constprop.0+0x336/0x6b8
[209173.414293] [<ffffffff80a70000>] __mutex_lock_slowpath+0x1a/0x22
[209173.422578] [<ffffffff80a7004a>] mutex_lock+0x42/0x4c
[209173.429832] [<ffffffff800d1a98>] proc_cgroup_show+0x5c/0x3ba
[209173.437697] [<ffffffff80306f7a>] proc_single_show+0x4e/0x9e
[209173.445493] [<ffffffff802a836a>] seq_read_iter+0x158/0x362
[209173.453113] [<ffffffff802a8608>] seq_read+0x94/0xc0
[209173.460196] [<ffffffff802797e2>] vfs_read+0xaa/0x238
[209173.467368] [<ffffffff8027a362>] ksys_read+0x6e/0xe4
[209173.474484] [<ffffffff8027a3f2>] sys_read+0x1a/0x22
[209173.481473] [<ffffffff80003cc8>] ret_from_syscall+0x0/0x2
sbi_trap_error: hart93: illegal instruction handler failed (error -2)
sbi_trap_error: hart93: mcause=0x0000000000000002 mtval=0x0000000000000000
sbi_trap_error: hart93: mepc=0x000000000015b1fa mstatus=0x0000000a00001820
sbi_trap_error: hart93: ra=0x000000000015b1f8 sp=0x00000000000e1f18
sbi_trap_error: hart93: gp=0xffffffff81a44ec8 tp=0xffffffe2b2a78000
sbi_trap_error: hart93: s0=0x00000000000e1f38 s1=0xffffffe81d005400
sbi_trap_error: hart93: a0=0x0013000000130000 a1=0xffffffffb51fb51f
sbi_trap_error: hart93: a2=0x000000000000b51f a3=0x000000000000b51f
sbi_trap_error: hart93: a4=0x000000000000b520 a5=0x0013000000130000
sbi_trap_error: hart93: a6=0x000000000000b51f a7=0x0000000000000080
sbi_trap_error: hart93: s2=0xfffffff65fa5f180 s3=0x0000000000016b20
sbi_trap_error: hart93: s4=0x0000000000000009 s5=0x0000000000000009
sbi_trap_error: hart93: s6=0x0000000200000022 s7=0xfffffff65fa5f8ff
sbi_trap_error: hart93: s8=0xffffffff81acb488 s9=0x000000000000005d
sbi_trap_error: hart93: s10=0xffffffff81a6ac98 s11=0xffffffe04f79bc08
sbi_trap_error: hart93: t0=0x0000000a00000820 t1=0x0000000000000001
sbi_trap_error: hart93: t2=0xffffffff8100bf28 t3=0x00000000ff00ff00
sbi_trap_error: hart93: t4=0xffffffd8ffa8fd38 t5=0x0000000000000002
sbi_trap_error: hart93: t6=0x02b11ee2af76cd48
felixonmars commented 8 months ago

Another crash with the first self-compiled kernel, on the same host:

[53111.356078] nvme nvme0: Abort status: 0x0                        
[53111.370408] nvme nvme0: I/O 309 (Write) QID 1 timeout, aborting  
[53111.374461] nvme nvme0: Abort status: 0x0                        
[53111.388915] nvme nvme0: I/O 310 (Write) QID 1 timeout, aborting  
[53111.392554] nvme nvme0: Abort status: 0x0
[53111.406580] nvme nvme0: I/O 311 (Write) QID 1 timeout, aborting  
[53111.410602] nvme nvme0: Abort status: 0x0
[53111.424884] nvme nvme0: I/O 312 (Write) QID 1 timeout, aborting  
[53111.428874] nvme nvme0: Abort status: 0x0
[53111.443731] nvme nvme0: Abort status: 0x0
[55284.493268] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[55284.503385] rcu:     41-...0: (1 GPs behind) idle=7b24/1/0x4000000000000000 softirq=3305695/3305728 fqs=1979
[55312.513142] watchdog: BUG: soft lockup - CPU#49 stuck for 22s! [migration/49:314]
[55312.525061] Modules linked in: tun cfg80211 rfkill xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c xt_TCPMSS xt_tcpudp iptable_filter vfat fat ixgbe mdio_devres ofpart of_mdio ipmi_si fixed_phy sophgo_spifmc fwnode_mdio spi_nor ipmi_devintf libphy igb 8250_dw gpio_dwapb ipmi_msghandler mtd mdio switchtec mousedev uio_pdrv_genirq uio tcp_bbr sch_fq fuse loop dm_mod nfnetlink bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 mmc_block sdhci_sophgo ast sdhci_pltfm usbhid nvme drm_vram_helper sdhci spi_dw_mmio drm_ttm_helper gpio_keys mmc_core spi_dw nvme_core ttm xhci_pci nvme_common xhci_pci_renesas
[55312.607363] CPU: 49 PID: 314 Comm: migration/49 Not tainted 6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1 d71360b18e23f727e028cd5c570bc8e9bdca43d0
[55312.628591] Hardware name: Sophgo Mango (DT)
[55312.636471] Stopper: multi_cpu_stop+0x0/0x172 <- migrate_swap+0xbe/0x158
[55312.647469] epc : rcu_momentary_dyntick_idle+0x3a/0x80
[55312.656635]  ra : multi_cpu_stop+0xb8/0x172
[55312.664391] epc : ffffffff8009ab24 ra : ffffffff800ef2cc sp : ffffffc80b0fbd90
[55312.675607]  gp : ffffffff81bac808 tp : ffffffe7fed63f00 t0 : 0000000000000080
[55312.686709]  t1 : 0000000001806000 t2 : 0000000000000001 s0 : ffffffc80b0fbe10
[55312.697704]  s1 : ffffffc8390eb970 a0 : ffffffff80e0da88 a1 : 0000000000000002
[55312.708524]  a2 : ffffffc8390eb9c8 a3 : ffffffff81c089c8 a4 : ffffffffe72fec6c
[55312.719750]  a5 : fffffff5db55bcc0 a6 : 0000000000000001 a7 : 0000000000000000
[55312.730866]  s2 : ffffffc8390eb994 s3 : ffffffffffffffff s4 : ffffffff80e0da88
[55312.741914]  s5 : 0000000000000001 s6 : 0000000000000000 s7 : 0000000000000002
[55312.752799]  s8 : 0000000000000004 s9 : 0000000000000001 s10: 0000000000000003
[55312.763432]  s11: 0000000000000001 t3 : 0000000000000000 t4 : 0000000000000000
[55312.774591]  t5 : 0000000000000034 t6 : 000000000000ffff
[55312.783551] status: 0000000200000120 badaddr: 0000000000000000 cause: 8000000000000001
[55312.794943] [<ffffffff8009ab24>] rcu_momentary_dyntick_idle+0x3a/0x80
[55312.804845] [<ffffffff800eee3a>] cpu_stopper_thread+0xfc/0x182
[55312.813935] [<ffffffff80044e76>] smpboot_thread_fn+0xe6/0x11a
[55312.823287] [<ffffffff80040aee>] kthread+0xbe/0xd4
[55312.831636] [<ffffffff80003f18>] ret_from_exception+0x0/0x16
[55340.510039] watchdog: BUG: soft lockup - CPU#49 stuck for 48s! [migration/49:314]
[55340.521060] Modules linked in: tun cfg80211 rfkill xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c xt_TCPMSS xt_tcpudp iptable_filter vfat fat ixgbe mdio_devres ofpart of_mdio ipmi_si fixed_phy sophgo_spifmc fwnode_mdio spi_nor ipmi_devintf libphy igb 8250_dw gpio_dwapb ipmi_msghandler mtd mdio switchtec mousedev uio_pdrv_genirq uio tcp_bbr sch_fq fuse loop dm_mod nfnetlink bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 mmc_block sdhci_sophgo ast sdhci_pltfm usbhid nvme drm_vram_helper sdhci spi_dw_mmio drm_ttm_helper gpio_keys mmc_core spi_dw nvme_core ttm xhci_pci nvme_common xhci_pci_renesas
[55340.599220] CPU: 49 PID: 314 Comm: migration/49 Tainted: G             L     6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1 d71360b18e23f727e028cd5c570bc8e9bdca43d0
[55340.619981] Hardware name: Sophgo Mango (DT)
[55340.627532] Stopper: multi_cpu_stop+0x0/0x172 <- migrate_swap+0xbe/0x158
[55340.637316] epc : rcu_momentary_dyntick_idle+0x3a/0x80
[55340.645582]  ra : multi_cpu_stop+0xb8/0x172
[55340.652699] epc : ffffffff8009ab24 ra : ffffffff800ef2cc sp : ffffffc80b0fbd90
[55340.663138]  gp : ffffffff81bac808 tp : ffffffe7fed63f00 t0 : 0000000000000080
[55340.673447]  t1 : 0000000001806000 t2 : 0000000000000001 s0 : ffffffc80b0fbe10
[55340.683806]  s1 : ffffffc8390eb970 a0 : ffffffff80e0da88 a1 : 0000000000000002
[55340.694164]  a2 : ffffffc8390eb9c8 a3 : ffffffff81c089c8 a4 : ffffffffc2766c74
[55340.704272]  a5 : fffffff5db55bcc0 a6 : 0000000000000001 a7 : 0000000000000000
[55340.714400]  s2 : ffffffc8390eb994 s3 : ffffffffffffffff s4 : ffffffff80e0da88
[55340.724650]  s5 : 0000000000000001 s6 : 0000000000000000 s7 : 0000000000000002
[55340.734871]  s8 : 0000000000000004 s9 : 0000000000000001 s10: 0000000000000003
[55340.745127]  s11: 0000000000000001 t3 : 0000000000000000 t4 : 0000000000000000
[55340.755163]  t5 : 0000000000000034 t6 : 000000000000ffff
[55340.763195] status: 0000000200000120 badaddr: 0000000000000000 cause: 8000000000000001
[55340.774063] [<ffffffff8009ab24>] rcu_momentary_dyntick_idle+0x3a/0x80
[55340.783460] [<ffffffff800eee3a>] cpu_stopper_thread+0xfc/0x182
[55340.792051] [<ffffffff80044e76>] smpboot_thread_fn+0xe6/0x11a
[55340.800608] [<ffffffff80040aee>] kthread+0xbe/0xd4
[55340.808312] [<ffffffff80003f18>] ret_from_exception+0x0/0x16