Open sathnaga opened 6 years ago
2:mon> x
[ 5261.809214] Unable to handle kernel paging request for data at address 0x00001c08
[ 5261.809274] WARNING: timekeeping: Cycle offset (2682080198472) is larger than allowed by the 'timebase' clock's max_cycles value (507162120199): time overflow danger
[ 5261.809283] timekeeping: Your kernel is sick, but tries to cope by capping time updates
[ 5261.815616] Faulting instruction address: 0xc000000000350840
cpu 0x2: Vector: 380 (Data SLB Access) at [c0000001fb237c70]
pc: c000000000350840: local_memory_node+0x20/0x80
lr: c00000000005242c: start_secondary+0x47c/0x530
sp: c0000001fb237ef0
msr: 8000000000001033
dar: 1c08
current = 0xc0000001fb1bca00
paca = 0xc00000003fffce00 irqmask: 0x03 irq_happened: 0x01
pid = 0, comm = swapper/2
Linux version 4.18.0-g6e61beb7 (root@9.40.192.86) (gcc version 8.1.1 20180712 (Red Hat 8.1.1-5) (GCC)) #2 SMP Thu Aug 30 04:23:21 EDT 2018
enter ? for help
[link register ] c00000000005242c start_secondary+0x47c/0x530
[c0000001fb237ef0] c00000000005235c start_secondary+0x3ac/0x530 (unreliable)
[c0000001fb237f90] c00000000000b270 start_secondary_prolog+0x10/0x14
2:mon> X
[ 5306.273092] Oops: Kernel access of bad area, sig: 11 [#1]
[ 5306.274285] LE SMP NR_CPUS=1024 NUMA pSeries
[ 5306.275166] Modules linked in:
[ 5306.275766] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.18.0-g6e61beb7 #2
[ 5306.277119] NIP: c000000000350840 LR: c00000000005242c CTR: 0000000000000000
[ 5306.278513] REGS: c0000001fb237c70 TRAP: 0380 Not tainted (4.18.0-g6e61beb7)
[ 5306.279950] MSR: 8000000000001033 <SF,ME,IR,DR,RI,LE> CR: 22000824 XER: 20000000
[ 5306.281469] CFAR: c00000000000e5d4 IRQMASK: 1
[ 5306.281469] GPR00: c00000000005242c c0000001fb237ef0 c000000001932500 0000000000000000
[ 5306.281469] GPR04: c0000001feb05480 0000000000000000 0000000000000008 0000000000000010
[ 5306.281469] GPR08: 0000000000000018 c000000001aeb170 0000000000000001 0000000000000000
[ 5306.281469] GPR12: 0000000000000000 c00000003fffce00 c0000001fb237f90 0000000000000000
[ 5306.281469] GPR16: 0000000000000000 0000000000000000 c00000000004ff50 c0000000019641a4
[ 5306.281469] GPR20: c000000001215500 0000000000000400 0000000000000001 0000000000000002
[ 5306.281469] GPR24: 0000000000000000 0000000000000008 c000000001961d70 c000000001ae0058
[ 5306.281469] GPR28: 0000000000000004 0000000000000001 c000000001963e70 c0000000013f32c8
[ 5306.290986] Unable to handle kernel paging request for data at address 0x00009c78
[ 5306.294978] NIP [c000000000350840] local_memory_node+0x20/0x80
[ 5306.294981] LR [c00000000005242c] start_secondary+0x47c/0x530
[c 5p3u0 60.x12:9 6Ve5c0t6o]r: F3a0u0l t(iDng iantas tArcuccteisosn) aadtd [rce0s0s0:0 000x1fc4063070804000]
c 0 3 f 1p8cfc:
o 0[ 005030060.0020937f61585fc] :C a_l_l_ sTlarba_cael:l
3c[+ 05x380c6/.03x0a10501
] [ c l0r0:0 0c00000100f0b0203073eff20]2 a[4c: 0_0_0s0la0b0_0a0l0l0o5c23+50cx]3 4s/teaxrt6_0s
ic o n dasrpy:+ 0cx030a0c/000x051f3406 3(u7narce0l
sa b le )m
0r[: 583000060.03000401080009]0 3[3c
00 0 0d0a0r1:f b92c377f89
b] d[scis0r0:0 0400000000000000
/2 7c0u] rsrteanrtt _=se c0oxncd0a0r00y0_0p1reoel4ogd+e05x0100
c 0 xp14a
sa[ 5 3=0 60.xc300060605060] 0I3nfsftfreucet0i0o n idurmqpm:a
0k[: 503x0063. 3 0i7r6q0_4h]ap p4een8e0d0:02 00 x60010
0 0 0 0p i6d 0 0 0= 01000570, 6c0o0m0m00 0=0 (3tcm4pfci0l1e5se)
bL8i4n2u1xc ev0e r7sci0o8n 042.a168 .6000-0g060e0601
e[b 75 3(0ro6o.t3@190.14660]. 139d22.2860)0 1(cg c7c86 3v1efr2s4i 3o9n2 988.c17.10 270c1689018721a2 <(8R1ed2 3H1atc 088>. 13.816-351) c(0G0C C2)b) 8#920 0S0M2P 4 1T9hdu0 001ug4 3
[0 453:0263.:321127 4E8D]T -2-01-8[
encpu 0x2: Vector: 100 (System Reset) at [c00000003ffc7d80]
pc: c0000000000cd81c: plpar_hcall_norets+0x1c/0x28
lr: c0000000000de5bc: hvc_put_chars+0x4c/0xb0
sp: c0000001fb237760
msr: 8000000000001033
current = 0xc0000001fb1bca00
paca = 0xc00000003fffce00 irqmask: 0x03 irq_happened: 0x01
pid = 0, comm = swapper/2
Linux version 4.18.0-g6e61beb7 (root@9.40.192.86) (gcc version 8.1.1 20180712 (Red Hat 8.1.1-5) (GCC)) #2 SMP Thu Aug 30 04:23:21 EDT 2018
enter ? for help
[link register ] c0000000003f22a4 __slab_alloc+0x34/0x60
[c0000001f4637ac0] c000000001963e70 __cpu_online_mask+0x0/0x80 (unreliable)
[c0000001f4637bc0] c0000001f4637bf0
[c0000001f4637bf0] c0000000003f2fb4 kmem_cache_alloc_node_trace+0x1a4/0x370
[c0000001f4637c60] c0000000001ad10c alloc_fair_sched_group+0x11c/0x280
[c0000001f4637d00] c000000000196890 sched_create_group+0x50/0xf0
[c0000001f4637d30] c0000000001c10dc sched_autogroup_create_attach+0x6c/0x200
[c0000001f4637dc0] c00000000016ca04 ksys_setsid+0x144/0x190
[c0000001f4637e10] c00000000016ca70 sys_setsid+0x20/0x30
[c0000001f4637e30] c00000000000b9e4 system_call+0x5c/0x70
--- Exception: c00 (System Call) at 00007fffb0c69010
SP (7fffd41267a0) is in userspace
1:mon>
git bisect yielded the below commit as issue ea05ba7c559c8e5a5946c3a94a2a266e9a6680a6 is the first bad commit
commit ea05ba7c559c8e5a5946c3a94a2a266e9a6680a6
Author: Michael Bringmann <mwb@linux.vnet.ibm.com>
Date: Tue Nov 28 16:58:40 2017 -0600
powerpc/numa: Ensure nodes initialized for hotplug
This patch fixes some problems encountered at runtime with
configurations that support memory-less nodes, or that hot-add CPUs
into nodes that are memoryless during system execution after boot. The
problems of interest include:
* Nodes known to powerpc to be memoryless at boot, but to have CPUs in
them are allowed to be 'possible' and 'online'. Memory allocations
for those nodes are taken from another node that does have memory
until and if memory is hot-added to the node.
* Nodes which have no resources assigned at boot, but which may still
be referenced subsequently by affinity or associativity attributes,
are kept in the list of 'possible' nodes for powerpc. Hot-add of
memory or CPUs to the system can reference these nodes and bring
them online instead of redirecting the references to one of the set
of nodes known to have memory at boot.
Note that this software operates under the context of CPU hotplug. We
are not doing memory hotplug in this code, but rather updating the
kernel's CPU topology (i.e. arch_update_cpu_topology /
numa_update_cpu_topology). We are initializing a node that may be used
by CPUs or memory before it can be referenced as invalid by a CPU
hotplug operation. CPU hotplug operations are protected by a range of
APIs including cpu_maps_update_begin/cpu_maps_update_done,
cpus_read/write_lock / cpus_read/write_unlock, device locks, and more.
Memory hotplug operations, including try_online_node, are protected by
mem_hotplug_begin/mem_hotplug_done, device locks, and more. In the
case of CPUs being hot-added to a previously memoryless node, the
try_online_node operation occurs wholly within the CPU locks with no
overlap. Using HMC hot-add/hot-remove operations, we have been able to
add and remove CPUs to any possible node without failures. HMC
operations involve a degree self-serialization, though.
Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
:040000 040000 0017d62b388ff1c70ef64f8aba5697151d4b824a e4bc24ea9158594d1992c4eca28b3cb7a0b61d27 M arch
Updates with latest kernel and qemu bits... VM is not crashing this time but hits a kernel bug with different call trace.
Env: HW: IBM Power8 Host kernel: 4.20.0-rc6-g56d7e379c qemu: version 3.1.50 (v2.8.0-rc0-16056-g8e1ac6cb1d-dirty) [commit 8e1ac6cb1d7e33a0594afc7fa1105cbce40f45fe (HEAD -> ppc-for-4.0)] Guest Kernel: 5.0.0-rc1-g3bd6e94be
2.Hotplug vcpus: $ virsh setvcpus vm1 8 --live kernel bug on guest, guest continue to be operational though.
# [ 59.691058] WARNING: workqueue cpumask: online intersect > possible intersect
[ 59.733903] root domain span: 0-2 (max cpu_capacity = 1024)
[ 59.950896] Built 2 zonelists, mobility grouping on. Total pages: 130146
[ 59.954125] Policy zone: Normal
[ 59.955916] root domain span: 0-3 (max cpu_capacity = 1024)
[ 60.071389] BUG: Kernel NULL pointer dereference at 0x00000400
[ 60.074974] Faulting instruction address: 0xc00000000017966c
[ 60.076687] Oops: Kernel access of bad area, sig: 11 [#1]
[ 60.078305] LE SMP NR_CPUS=2048 NUMA pSeries
[ 60.079598] Modules linked in:
[ 60.080516] CPU: 4 PID: 3024 Comm: kworker/4:0 Not tainted 5.0.0-rc1-g3bd6e94be #3
[ 60.082836] Workqueue: events cpuset_hotplug_workfn
[ 60.084309] NIP: c00000000017966c LR: c000000000179738 CTR: 0000000000000000
[ 60.086455] REGS: c0000001f86a7130 TRAP: 0380 Not tainted (5.0.0-rc1-g3bd6e94be)
[ 60.088756] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 22824424 XER: 00000000
[ 60.091140] CFAR: c0000000001796b0 IRQMASK: 0
[ 60.091140] GPR00: c000000000179738 c0000001f86a73c0 c0000000014c9a00 c0000001f9ff2c00
[ 60.091140] GPR04: 0000000000000001 0000000000000000 0000000000000008 0000000000000010
[ 60.091140] GPR08: 0000000000000018 ffffffffffffffff 0000000000000400 0000000000000000
[ 60.091140] GPR12: 0000000000008800 c00000003fffb300 c000000001506104 0000000000000800
[ 60.091140] GPR16: c0000001f9d60000 c000000000efc048 000000000000102f ffffffffffffe830
[ 60.091140] GPR20: ffffffffffffec30 000000000000102f c0000001fa4db800 c0000001fa4dd800
[ 60.091140] GPR24: c0000001e6200000 00000001fe4c0000 0000000000000001 c0000000010a8080
[ 60.091140] GPR28: c0000001fa4d3400 c0000001f9ff2c00 c0000001f9ff6bff c0000001f9ff3e00
[ 60.104265] NIP [c00000000017966c] free_sched_groups.part.2+0x5c/0xf0
[ 60.105448] LR [c000000000179738] destroy_sched_domain+0x38/0xc0
[ 60.106553] Call Trace:
[ 60.107004] [c0000001f86a73c0] [0000000000000001] 0x1 (unreliable)
[ 60.108140] [c0000001f86a7400] [c000000000179738] destroy_sched_domain+0x38/0xc0
[ 60.109502] [c0000001f86a7430] [c000000000179b1c] cpu_attach_domain+0xfc/0x940
[ 60.110834] [c0000001f86a7570] [c00000000017b624] build_sched_domains+0x12c4/0x13d0
[ 60.112243] [c0000001f86a76b0] [c00000000017c7f4] partition_sched_domains+0x254/0x3d4
[ 60.113673] [c0000001f86a7740] [c000000000201db0] rebuild_sched_domains_locked+0x400/0x700
[ 60.115144] [c0000001f86a7830] [c000000000206888] rebuild_sched_domains+0x38/0x60
[ 60.116477] [c0000001f86a7860] [c000000000206c04] cpuset_hotplug_workfn+0x354/0xde0
[ 60.117837] [c0000001f86a7c80] [c000000000138d60] process_one_work+0x2b0/0x560
[ 60.119124] [c0000001f86a7d10] [c000000000139098] worker_thread+0x88/0x610
[ 60.120350] [c0000001f86a7db0] [c00000000014203c] kthread+0x1ac/0x1c0
[ 60.121499] [c0000001f86a7e20] [c00000000000bdd4] ret_from_kernel_thread+0x5c/0x68
[ 60.122849] Instruction dump:
[ 60.123381] 91810008 2e240000 f8010010 f821ffc1 48000010 7fbee840 7fdff378 419e0074
[ 60.124761] ebdf0000 4192002c e95f0010 7c0004ac <7d205028> 3129ffff 7d20512d 40c2fff4
[ 60.126177] ---[ end trace 1632a73375cd2dbb ]---
[ 60.131793]
[ 60.132087] kworker/4:0 (3024) used greatest stack depth: 9024 bytes left
[root@atest-guest ~]# lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 6 On-line CPU(s) list: 0-4 Off-line CPU(s) list: 5 Thread(s) per core: 1 Core(s) per socket: 5 Socket(s): 1 NUMA node(s): 3 Model: 2.1 (pvr 004b 0201) Model name: POWER8 (architected), altivec supported Hypervisor vendor: KVM Virtualization type: para L1d cache: 64K L1i cache: 32K NUMA node0 CPU(s): 0,1 NUMA node1 CPU(s): 2,3 NUMA node2 CPU(s): 4
Noticed that x86_64(intel) guest on x86_64 host with above configuration crashes different call trace though and documented observation here, https://bugzilla.kernel.org/show_bug.cgi?id=202187
KVM Guest VM crashes during vcpu hotplug with specific numa configuration
Env:
Steps:
guest crash log:
guest numa config:
xmon: debug data
qemu commandline:
vm1.txt