linuxppc / issues

Issues repository for linuxppc
5 stars 0 forks source link

p->hardirqs_enabled false in copy_process() ? #124

Open mpe opened 6 years ago

mpe commented 6 years ago

Seen on ozrom1 and p85 during CI runs:

[ 2471.461866] ------------[ cut here ]------------
[ 2471.462097] DEBUG_LOCKS_WARN_ON(!p->hardirqs_enabled)
[ 2471.462159] WARNING: CPU: 34 PID: 2 at kernel/fork.c:1660 copy_process.isra.6.part.7+0x7b4/0x1ba0
[ 2471.462416] Modules linked in: vmx_crypto crc32c_vpmsum
[ 2471.462652] CPU: 34 PID: 2 Comm: kthreadd Tainted: G        W        4.15.0-gcc6x-next-20180131-g537659c #1
[ 2471.462825] NIP:  c0000000000e29b4 LR: c0000000000e29b0 CTR: c000000000bff890
[ 2471.462982] REGS: c0000007fad07940 TRAP: 0700   Tainted: G        W         (4.15.0-gcc6x-next-20180131-g537659c)
[ 2471.463170] MSR:  9000000002029033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 28022222  XER: 20000000
[ 2471.463785] CFAR: c0000000000e5240 SOFTE: 0 
[ 2471.463785] GPR00: c0000000000e29b0 c0000007fad07bc0 c0000000011e7b00 0000000000000029 
[ 2471.463785] GPR04: 0000000000000001 c00000000017bdb4 0000000000000000 0000000000000001 
[ 2471.463785] GPR08: 00000007fed50000 0000000000000000 0000000000000000 0000000000000000 
[ 2471.463785] GPR12: 0000000000002200 c00000000fd0cc00 c00000000011d4f8 0000000000000000 
[ 2471.463785] GPR16: 0000000000000000 c000001e4447a100 0000000000000000 0000000000000000 
[ 2471.463785] GPR20: c0000017fb902e80 0000000000000000 0000000000000000 0000000000000000 
[ 2471.463785] GPR24: c000000001e9df58 c0000000012bf6b0 c000001e52e0fbc0 c00000000011bcc0 
[ 2471.463785] GPR28: 0000000000000000 0000000000000000 c0000000000e3f98 0000000000800711 
[ 2471.466735] NIP [c0000000000e29b4] copy_process.isra.6.part.7+0x7b4/0x1ba0
[ 2471.466928] LR [c0000000000e29b0] copy_process.isra.6.part.7+0x7b0/0x1ba0
[ 2471.467119] Call Trace:
[ 2471.467230] [c0000007fad07bc0] [c0000000000e29b0] copy_process.isra.6.part.7+0x7b0/0x1ba0 (unreliable)
[ 2471.467549] [c0000007fad07ce0] [c0000000000e3f98] _do_fork+0xd8/0x8d0
[ 2471.467783] [c0000007fad07d90] [c0000000000e47ec] kernel_thread+0x3c/0x50
[ 2471.468011] [c0000007fad07db0] [c00000000011d730] kthreadd+0x240/0x300
[ 2471.468243] [c0000007fad07e30] [c00000000000b8e8] ret_from_kernel_thread+0x5c/0x74
[ 2471.468498] Instruction dump:
[ 2471.468672] 60000000 60000000 60420000 7c7507b4 faf112f0 4bfffda8 3c82ffc4 3c62ffc4 
[ 2471.469185] 3884e1b8 3863e1d0 4800282d 60000000 <0fe00000> 4bffffbc 90610060 e92d0250 
[ 2471.469634] ---[ end trace 8a2678428ed1e8e8 ]---
[ 3162.247941] Offlined Pages 4096
[ 3162.285986] Offlined Pages 4096

No clear reproducer, though seems to happen consistently.

Also:

NOHZ: local_softirq_pending 202

First seen:

Feb  1 04:32
Linux p85 4.15.0-gcc6x-next-20180131-g537659c #1 SMP Thu Feb 1 02:38:45 AEDT 2018 ppc64le ppc64le ppc64le GNU/Linux
npiggin commented 2 years ago

Is this still reproducible?

mpe commented 2 years ago

I don't see it anymore. Our defconfigs don't enable NO_HZ_FULL, so possibly it's still there but I don't see it because of that.