linuxppc / issues

Issues repository for linuxppc
5 stars 0 forks source link

VM_WARN_ON in switch_mm_irqs_off() when offlining a CPU #469

Open mpe opened 1 year ago

mpe commented 1 year ago

The VM_WARN_ON() in switch_mm_irqs_off() added by commit torvalds/linux@177255afb40548fdf504384b361d18d6cbe35d1e some times fires when offling a CPU:

# echo 0 > /sys/devices/system/cpu/cpu1/online^Mecho 0 > /sys/devices/system/cpu/cpu1/online
[    3.278821][    T0] ------------[ cut here ]------------
[    3.278823][    T0] WARNING: CPU: 1 PID: 0 at arch/powerpc/mm/mmu_context.c:106 switch_mm_irqs_off+0x120/0x150
[    3.278834][    T0] Modules linked in:
[    3.278837][    T0] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.6.0-rc2 #410
[    3.278841][    T0] Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000004 of:SLOF,HEAD hv:linux,kvm pSeries
[    3.278844][    T0] NIP:  c00000000008c6d0 LR: c00000000008c6b8 CTR: 0000000000000000
[    3.278847][    T0] REGS: c0000000049bfb50 TRAP: 0700   Not tainted  (6.6.0-rc2)
[    3.278850][    T0] MSR:  8000000002823033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE>  CR: 2800020a  XER: 00000000
[    3.278866][    T0] CFAR: c00000000008c690 IRQMASK: 3
[    3.278866][    T0] GPR00: c00000000008c6b8 c0000000049bfdf0 c000000001578a00 c0000000fffff408
[    3.278866][    T0] GPR04: c000000002a10b22 000000000000000a 0000000000000000 0000000000000000
[    3.278866][    T0] GPR08: 0000000000000000 0000000000000000 0000000000000003 0000000000000000
[    3.278866][    T0] GPR12: c0000000000fbed0 c0000000fffff300 0000000000000000 0000000000000000
[    3.278866][    T0] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    3.278866][    T0] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000001
[    3.278866][    T0] GPR24: 0000000000000001 0000000000000000 0000000000000000 c0000000029a13f4
[    3.278866][    T0] GPR28: 0000000000000001 0000000000000000 0000000000000001 c000000006fb6e00
[    3.278920][    T0] NIP [c00000000008c6d0] switch_mm_irqs_off+0x120/0x150
[    3.278926][    T0] LR [c00000000008c6b8] switch_mm_irqs_off+0x108/0x150
[    3.278931][    T0] Call Trace:
[    3.278932][    T0] [c0000000049bfdf0] [c00000000008c6b8] switch_mm_irqs_off+0x108/0x150 (unreliable)
[    3.278939][    T0] [c0000000049bfe30] [c0000000001a7dc0] idle_task_exit+0x90/0xc0
[    3.278946][    T0] [c0000000049bfe60] [c0000000000fbf00] pseries_cpu_offline_self+0x30/0xf0
[    3.278952][    T0] [c0000000049bfed0] [c00000000005c7b4] arch_cpu_idle_dead+0x44/0x50
[    3.278958][    T0] [c0000000049bfef0] [c0000000001cb114] do_idle+0x284/0x390
[    3.278963][    T0] [c0000000049bff60] [c0000000001cb4c0] cpu_startup_entry+0x40/0x50
[    3.278969][    T0] [c0000000049bff90] [c00000000005c21c] start_secondary+0x29c/0x2b0
[    3.278975][    T0] [c0000000049bffe0] [c00000000000e258] start_secondary_prolog+0x10/0x14
[    3.278980][    T0] Code: eba1ffe8 ebc1fff0 ebe1fff8 4e800020 7ca32b78 48009425 60000000 4bffffc4 0fe00000 4bffff44 60000000 60420000 <0fe00000> e8010050 7c0803a6 4bffffc0
[    3.279001][    T0] ---[ end trace 0000000000000000 ]---
npiggin commented 9 months ago

This should be the fix here

https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20230524060455.147699-1-npiggin@gmail.com/

Although that iteration ended up being flamed by tglx and I've yet to get back to it :( I must get onto it.

mpe commented 9 months ago

OK thanks. We should probably just drop the VM_WARN_ON_ONCE() for now, Fedora users might be seeing it already, and put it back once things are sorted out.

npiggin commented 9 months ago

Yeah fair enough, could do that.