linuxppc / issues

Issues repository for linuxppc
5 stars 0 forks source link

[4.18.0-g6e61beb7][Power8] KVM Guest VM crashes during vcpu hotplug with specific numa configuration #174

Open sathnaga opened 6 years ago

sathnaga commented 6 years ago

KVM Guest VM crashes during vcpu hotplug with specific numa configuration

Env:

Host:
Power8 Tuleta
# lscpu
Architecture:         ppc64le
Byte Order:           Little Endian
CPU(s):               80
On-line CPU(s) list:  0,8,16,24,32,40,48,56,64,72
Off-line CPU(s) list: 1-7,9-15,17-23,25-31,33-39,41-47,49-55,57-63,65-71,73-79
Thread(s) per core:   1
Core(s) per socket:   5
Socket(s):            2
NUMA node(s):         2
Model:                2.1 (pvr 004b 0201)
Model name:           POWER8E, altivec supported
CPU max MHz:          3690.0000
CPU min MHz:          2061.0000
L1d cache:            64K
L1i cache:            32K
L2 cache:             512K
L3 cache:             8192K
NUMA node0 CPU(s):    0,8,16,24,32
NUMA node1 CPU(s):    40,48,56,64,72

Host Kernel: 4.18.0-g6e61beb7
qemu: QEMU emulator version 3.0.50 (v2.8.0-rc0-13871-g9c945016cd-dirty)
libvirt: 4.7.0

Guest Kernel: 4.18.0-g6e61beb7

Steps:

1. Define and start a vm(vm1) attached xml(vm1.txt)
#virsh define vm1.txt; virsh start vm1
...
<vcpu placement='static' current='2'>8</vcpu>
..
    <topology sockets='4' cores='2' threads='1'/>
    <numa>
      <cell id='0' cpus='0-1' memory='0' unit='KiB'/>
      <cell id='1' cpus='2-3' memory='0' unit='KiB'/>
      <cell id='2' cpus='4-5' memory='8388608' unit='KiB'/>
      <cell id='3' cpus='6-7' memory='0' unit='KiB'/>
    </numa>
...
2. check lscpu inside guest 
# lscpu
Architecture:        ppc64le
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  1
Core(s) per socket:  2
Socket(s):           1
NUMA node(s):        2
Model:               2.1 (pvr 004b 0201)
Model name:          POWER8 (architected), altivec supported
Hypervisor vendor:   KVM
Virtualization type: para
L1d cache:           64K
L1i cache:           32K
NUMA node0 CPU(s):   0,1
NUMA node2 CPU(s):   

[root@atest-guest ~]# cat /sys/devices/system/node/online 
0,2
[root@atest-guest ~]# cat /sys/devices/system/node/possible 
0-2
[root@atest-guest ~]# cat /sys/devices/system/node/has_
has_cpu            has_memory         has_normal_memory  
[root@atest-guest ~]# cat /sys/devices/system/node/has_cpu 
0
[root@atest-guest ~]# cat /sys/devices/system/node/has_memory 
2
[root@atest-guest ~]# cat /sys/devices/system/node/has_normal_memory 
2

3. Hotplug vcpus
virsh setvcpus avocado-vt-vm1 8 --live

----Guest crash

guest crash log:

# [   23.269810] VPHN is not supported. Disabling polling...
[   23.351811] Unable to handle kernel paging request for data at address 0x00001c08
[   23.356730] Faulting instruction address: 0xc000000000350840
cpu 0x2: Vector: 380 (Data SLB Access) at [c0000001fb237c70]
    pc: c000000000350840: local_memory_node+0x20/0x80
    lr: c00000000005242c: start_secondary+0x47c/0x530
    sp: c0000001fb237ef0
   msr: 8000000000001033
   dar: 1c08
  current = 0xc0000001fb1bca00
  paca    = 0xc00000003fffce00   irqmask: 0x03   irq_happened: 0x01
    pid   = 0, comm = swapper/2
Linux version 4.18.0-g6e61beb7 (root@9.40.192.86) (gcc version 8.1.1 20180712 (Red Hat 8.1.1-5) (GCC)) #2 SMP Thu Aug 30 04:23:21 EDT 2018
enter ? for help
[link register   ] c00000000005242c start_secondary+0x47c/0x530
[c0000001fb237ef0] c00000000005235c start_secondary+0x3ac/0x530 (unreliable)
[c0000001fb237f90] c00000000000b270 start_secondary_prolog+0x10/0x14
2:mon>

guest numa config:

<memory unit='KiB'>8388608</memory>
  <currentMemory unit='KiB'>8388608</currentMemory>
  <vcpu placement='static' current='2'>8</vcpu>
  <os>
    <type arch='ppc64le' machine='pseries-3.1'>hvm</type>
    <kernel>/home/kvmci/linux/vmlinux</kernel>
    <cmdline>root=/dev/sda2 rw console=tty0 console=ttyS0,115200 init=/sbin/init  initcall_debug selinux=0 xmon=on</cmdline>
    <boot dev='hd'/>
  </os>
  <cpu>
    <topology sockets='4' cores='2' threads='1'/>
    <numa>
      <cell id='0' cpus='0-1' memory='0' unit='KiB'/>
      <cell id='1' cpus='2-3' memory='0' unit='KiB'/>
      <cell id='2' cpus='4-5' memory='8388608' unit='KiB'/>
      <cell id='3' cpus='6-7' memory='0' unit='KiB'/>
    </numa>
  </cpu>

xmon: debug data

2:mon> mi
[ 1945.712342] Mem-Info:
[ 1945.713156] active_anon:2159 inactive_anon:29 isolated_anon:0
[ 1945.713156]  active_file:821 inactive_file:2081 isolated_file:0
[ 1945.713156]  unevictable:0 dirty:36 writeback:0 unstable:0
[ 1945.713156]  slab_reclaimable:537 slab_unreclaimable:1151
[ 1945.713156]  mapped:1223 shmem:55 pagetables:27 bounce:0
[ 1945.713156]  free:122724 free_pcp:22 free_cma:0
[ 1945.722733] Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 1945.730348] Node 2 active_anon:138176kB inactive_anon:1856kB active_file:52544kB inactive_file:133184kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:78272kB dirty:2304kB writeback:0kB shmem:3520kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 1945.738672] Node 2 DMA free:7854336kB min:11520kB low:19840kB high:28160kB active_anon:138176kB inactive_anon:1856kB active_file:52544kB inactive_file:133184kB unevictable:0kB writepending:2304kB present:8388608kB managed:8331328kB mlocked:0kB kernel_stack:1664kB pagetables:1728kB bounce:0kB free_pcp:1408kB local_pcp:0kB free_cma:0kB
[ 1945.748009] lowmem_reserve[]: 0 0 0 0
[ 1945.749181] Node 2 DMA: 8*64kB (UME) 12*128kB (U) 9*256kB (UME) 2*512kB (U) 1*1024kB (U) 2*2048kB (UE) 3*4096kB (UME) 2*8192kB (ME) 477*16384kB (M) = 7854336kB
[ 1945.753732] 2956 total pagecache pages
[ 1945.754908] 0 pages in swap cache
[ 1945.755974] Swap cache stats: add 0, delete 0, find 0/0
[ 1945.757631] Free swap  = 0kB
[ 1945.758551] Total swap = 0kB
[ 1945.759740] 131072 pages RAM
[ 1945.760631] 0 pages HighMem/MovableOnly
[ 1945.761870] 895 pages reserved
[ 1945.762841] 0 pages cma reserved
[ 1945.763848] 0 pages hwpoisoned
2:mon> c
cpus stopped: 0x0-0x2
2:mon> r
R00 = c00000000005242c   R16 = 0000000000000000
R01 = c0000001fb237ef0   R17 = 0000000000000000
R02 = c000000001932500   R18 = c00000000004ff50
R03 = 0000000000000000   R19 = c0000000019641a4
R04 = c0000001feb05480   R20 = c000000001215500
R05 = 0000000000000000   R21 = 0000000000000400
R06 = 0000000000000008   R22 = 0000000000000001
R07 = 0000000000000010   R23 = 0000000000000002
R08 = 0000000000000018   R24 = 0000000000000000
R09 = c000000001aeb170   R25 = 0000000000000008
R10 = 0000000000000001   R26 = c000000001961d70
R11 = 0000000000000000   R27 = c000000001ae0058
R12 = 0000000000000000   R28 = 0000000000000004
R13 = c00000003fffce00   R29 = 0000000000000001
R14 = c0000001fb237f90   R30 = c000000001963e70
R15 = 0000000000000000   R31 = c0000000013f32c8
pc  = c000000000350840 local_memory_node+0x20/0x80
cfar= c000000000052428 start_secondary+0x478/0x530
lr  = c00000000005242c start_secondary+0x47c/0x530
msr = 8000000000001033   cr  = 22000824
ctr = 0000000000000000   xer = 0000000020000000   trap =  380
dar = 0000000000001c08   dsisr = c0000001ffff8948
2:mon> S
msr    = 8000000000001033  sprg0 = 0000000000000000
pvr    = 00000000004b0201  sprg1 = c00000003fffce00
dec    = 00000000a60fd85c  sprg2 = c00000003fffce00
sp     = c0000001fb2376e0  sprg3 = 0000000000000002
toc    = c000000001932500  dar   = 0000000000001c08
srr0   = c0000000000cd81c  srr1  = 8000000000001033 dsisr  = 00000000
dscr   = 0000000000000000  ppr   = 0010000000000000 pir    = 000000a0
amr    = 0000000000000000  uamor = 0000000000000000
dpdes  = 0000000000000000  tir   = 0000000000000000 cir    = 00000000
fscr   = 0000000000000184  tar   = 0000000000000000 pspb   = 00000000
mmcr0  = 0000000080000000  mmcr1 = 0000000000000000 mmcr2  = 0000000000000000
pmc1   = 00000000 pmc2 = 00000000  pmc3 = 00000000  pmc4   = 00000000
mmcra  = 0000000000000000   siar = 0000000000000000 pmc5   = 0000176f
sdar   = 0000000000000000   sier = 0000000000000000 pmc6   = 0000c907
ebbhr  = 0000000000000000  ebbrr = 0000000000000000 bescr  = 0000000000000000
iamr   = 0000000000000000
2:mon> 
2:mon> dpa
paca for cpu 0x0 @ c0000000025d0000:
 possible                  = yes
 present                   = yes
 online                    = yes
 lock_token                = 0x8000             (0xa)
 paca_index                = 0x0                (0x8)
 kernel_toc                = 0xc000000001932500 (0x10)
 kernelbase                = 0xc000000000000000 (0x18)
 kernel_msr                = 0xb000000000001033 (0x20)
 emergency_sp              = c00000003fff4000   (0x28)
 nmi_emergency_sp          = c00000003ffe0000   (0xa90)
 mc_emergency_sp           = c00000003ffdc000   (0xa98)
 in_nmi                    = 0x1                (0xaa0)
 in_mce                    = 0x0                (0xaa2)
 hmi_event_available       = 0x0                (0xaa4)
 data_offset               = 0x1fd4f0000        (0x30)
 hw_cpu_id                 = 0x0                (0x38)
 cpu_start                 = 0x1                (0x3a)
 kexec_state               = 0x0                (0x3b)
 slb_shadow            [0] = 0xc000000008000000 0x400ea1b217000510
 slb_shadow            [1] = 0xd000000008000001 0x400d43642f000510
 slb_shadow            [2] = 0x0000000000000002 0x0000000000000000
 vmalloc_sllp              = 0x510              (0x120)
 slb_cache_ptr             = 0x4                (0x122)
 slb_cache             [0] = 0x000000000007f000
 slb_cache             [1] = 0x0000000000000001
 slb_cache             [2] = 0x0000000000001000
 slb_cache             [3] = 0x0000000000000012
 slb_cache             [4] = 0x0000000000001000
 slb_cache             [5] = 0x0000000000000000
 slb_cache             [6] = 0x0000000000000000
 slb_cache             [7] = 0x0000000000000000
 rfi_flush_fallback_area   = c00000003ff60000   (0x11d0)
 dscr_default              = 0x0                (0x58)
 __current                 = c0000001fb180000   (0x960)
 kstack                    = 0xc0000001fb22fe30 (0x968)
 kstack_base               = 0xc0000001fb22c000
 stab_rr                   = 0x19               (0x970)
 saved_r1                  = 0xc0000001fffeba90 (0x978)
 trap_save                 = 0x0                (0x988)
 irq_soft_mask             = 0x3                (0x98a)
 irq_happened              = 0x1                (0x98b)
 io_sync                   = 0x0                (0x98c)
 irq_work_pending          = 0x0                (0x98d)
 nap_state_lost            = 0x0                (0x98e)
 sprg_vdso                 = 0x0                (0x990)
 tm_scratch                = 0x800000000280b033 (0x998)
 core_idle_state_ptr       = 0                  (0x9a0)
 thread_idle_state         = 0x0                (0x9a8)
 thread_mask               = 0x0                (0x9a9)
 subcore_sibling_mask      = 0x0                (0x9aa)
 requested_psscr           = 0x0                (0x9b0)
 stop_sprs.pid             = 0x0                (0x9b8)
 stop_sprs.ldbar           = 0x0                (0x9c0)
 stop_sprs.fscr            = 0x0                (0x9c8)
 stop_sprs.hfscr           = 0x0                (0x9d0)
 stop_sprs.mmcr1           = 0x0                (0x9d8)
 stop_sprs.mmcr2           = 0x0                (0x9e0)
 stop_sprs.mmcra           = 0x0                (0x9e8)
 dont_stop.counter         = 0x0                (0x9ac)
 accounting.utime          = 0x0                (0xaa8)
 accounting.stime          = 0x0                (0xab0)
 accounting.utime_scaled   = 0x0                (0xab8)
 accounting.starttime      = 0x35904c2b2        (0xaf0)
 accounting.starttime_user = 0x355f22b7f        (0xaf8)
 accounting.startspurr     = 0xe3a0a424         (0xb00)
 accounting.utime_sspurr   = 0x0                (0xb08)
 accounting.steal_time     = 0x0                (0xae0)
paca for cpu 0x1 @ c00000003fffee00:
 possible                  = yes
 present                   = yes
 online                    = yes
 lock_token                = 0x8000             (0xa)
 paca_index                = 0x1                (0x8)
 kernel_toc                = 0xc000000001932500 (0x10)
 kernelbase                = 0xc000000000000000 (0x18)
 kernel_msr                = 0xb000000000001033 (0x20)
 emergency_sp              = c00000003ffd8000   (0x28)
 nmi_emergency_sp          = c00000003ffd4000   (0xa90)
 mc_emergency_sp           = c00000003ffd0000   (0xa98)
 in_nmi                    = 0x1                (0xaa0)
 in_mce                    = 0x0                (0xaa2)
 hmi_event_available       = 0x0                (0xaa4)
 data_offset               = 0x1fd6f0000        (0x30)
 hw_cpu_id                 = 0x8                (0x38)
 cpu_start                 = 0x1                (0x3a)
 kexec_state               = 0x0                (0x3b)
 slb_shadow            [0] = 0xc000000008000000 0x400ea1b217000510
 slb_shadow            [1] = 0xd000000008000001 0x400d43642f000510
 slb_shadow            [2] = 0x0000000000000002 0x0000000000000000
 vmalloc_sllp              = 0x510              (0x120)
 slb_cache_ptr             = 0x3                (0x122)
 slb_cache             [0] = 0x000000000007f000
 slb_cache             [1] = 0x0000000000000001
 slb_cache             [2] = 0x0000000000001000
 slb_cache             [3] = 0x0000000000000011
 slb_cache             [4] = 0x0000000000001000
 slb_cache             [5] = 0x0000000000000000
 slb_cache             [6] = 0x0000000000000000
 slb_cache             [7] = 0x0000000000000000
 rfi_flush_fallback_area   = c00000003ff60000   (0x11d0)
 dscr_default              = 0x0                (0x58)
 __current                 = c0000001fb1a4600   (0x960)
 kstack                    = 0xc0000001fb23fe30 (0x968)
 kstack_base               = 0xc0000001fb23c000
 stab_rr                   = 0x18               (0x970)
 saved_r1                  = 0xc0000001fb23fda0 (0x978)
 trap_save                 = 0x0                (0x988)
 irq_soft_mask             = 0x3                (0x98a)
 irq_happened              = 0x1                (0x98b)
 io_sync                   = 0x0                (0x98c)
 irq_work_pending          = 0x0                (0x98d)
 nap_state_lost            = 0x0                (0x98e)
 sprg_vdso                 = 0x1                (0x990)
 tm_scratch                = 0x800000000280b033 (0x998)
 core_idle_state_ptr       = 0                  (0x9a0)
 thread_idle_state         = 0x0                (0x9a8)
 thread_mask               = 0x0                (0x9a9)
 subcore_sibling_mask      = 0x0                (0x9aa)
 requested_psscr           = 0x0                (0x9b0)
 stop_sprs.pid             = 0x0                (0x9b8)
 stop_sprs.ldbar           = 0x0                (0x9c0)
 stop_sprs.fscr            = 0x0                (0x9c8)
 stop_sprs.hfscr           = 0x0                (0x9d0)
 stop_sprs.mmcr1           = 0x0                (0x9d8)
 stop_sprs.mmcr2           = 0x0                (0x9e0)
 stop_sprs.mmcra           = 0x0                (0x9e8)
 dont_stop.counter         = 0x0                (0x9ac)
 accounting.utime          = 0x0                (0xaa8)
 accounting.stime          = 0x0                (0xab0)
 accounting.utime_scaled   = 0x0                (0xab8)
 accounting.starttime      = 0x3581a53d1        (0xaf0)
 accounting.starttime_user = 0x35470bde0        (0xaf8)
 accounting.startspurr     = 0x771cc150         (0xb00)
 accounting.utime_sspurr   = 0x0                (0xb08)
 accounting.steal_time     = 0x0                (0xae0)
paca for cpu 0x2 @ c00000003fffce00:
 possible                  = yes
 present                   = yes
 online                    = no
 lock_token                = 0x8000             (0xa)
 paca_index                = 0x2                (0x8)
 kernel_toc                = 0xc000000001932500 (0x10)
 kernelbase                = 0xc000000000000000 (0x18)
 kernel_msr                = 0xb000000000001033 (0x20)
 emergency_sp              = c00000003ffcc000   (0x28)
 nmi_emergency_sp          = c00000003ffc8000   (0xa90)
 mc_emergency_sp           = c00000003ffc4000   (0xa98)
 in_nmi                    = 0x0                (0xaa0)
 in_mce                    = 0x0                (0xaa2)
 hmi_event_available       = 0x0                (0xaa4)
 data_offset               = 0x1fd8f0000        (0x30)
 hw_cpu_id                 = 0x10               (0x38)
 cpu_start                 = 0x1                (0x3a)
 kexec_state               = 0x0                (0x3b)
 slb_shadow            [0] = 0xc000000008000000 0x400ea1b217000510
 slb_shadow            [1] = 0xd000000008000001 0x400d43642f000510
 slb_shadow            [2] = 0x0000000000000002 0x0000000000000000
 vmalloc_sllp              = 0x510              (0x120)
 slb_cache_ptr             = 0x0                (0x122)
 slb_cache             [0] = 0x0000000000000000
 slb_cache             [1] = 0x0000000000000000
 slb_cache             [2] = 0x0000000000000000
 slb_cache             [3] = 0x0000000000000000
 slb_cache             [4] = 0x0000000000000000
 slb_cache             [5] = 0x0000000000000000
 slb_cache             [6] = 0x0000000000000000
 slb_cache             [7] = 0x0000000000000000
 rfi_flush_fallback_area   = c00000003ff60000   (0x11d0)
 dscr_default              = 0x0                (0x58)
 __current                 = c0000001fb1bca00   (0x960)
 kstack                    = 0xc0000001fb237f90 (0x968)
 kstack_base               = 0xc0000001fb234000
 stab_rr                   = 0x3                (0x970)
 saved_r1                  = 0xc0000001fb2374e0 (0x978)
 trap_save                 = 0x0                (0x988)
 irq_soft_mask             = 0x3                (0x98a)
 irq_happened              = 0x1                (0x98b)
 io_sync                   = 0x0                (0x98c)
 irq_work_pending          = 0x0                (0x98d)
 nap_state_lost            = 0x0                (0x98e)
 sprg_vdso                 = 0x2                (0x990)
 tm_scratch                = 0x0                (0x998)
 core_idle_state_ptr       = 0                  (0x9a0)
 thread_idle_state         = 0x0                (0x9a8)
 thread_mask               = 0x0                (0x9a9)
 subcore_sibling_mask      = 0x0                (0x9aa)
 requested_psscr           = 0x0                (0x9b0)
 stop_sprs.pid             = 0x0                (0x9b8)
 stop_sprs.ldbar           = 0x0                (0x9c0)
 stop_sprs.fscr            = 0x0                (0x9c8)
 stop_sprs.hfscr           = 0x0                (0x9d0)
 stop_sprs.mmcr1           = 0x0                (0x9d8)
 stop_sprs.mmcr2           = 0x0                (0x9e0)
 stop_sprs.mmcra           = 0x0                (0x9e8)
 dont_stop.counter         = 0x0                (0x9ac)
 accounting.utime          = 0x0                (0xaa8)
 accounting.stime          = 0x0                (0xab0)
 accounting.utime_scaled   = 0x0                (0xab8)
 accounting.starttime      = 0x0                (0xaf0)
 accounting.starttime_user = 0x0                (0xaf8)
 accounting.startspurr     = 0x0                (0xb00)
 accounting.utime_sspurr   = 0x0                (0xb08)
 accounting.steal_time     = 0x0                (0xae0)
paca for cpu 0x3 @ c00000003fffbc00:
 possible                  = yes
 present                   = no
 online                    = no
 lock_token                = 0x8000             (0xa)
 paca_index                = 0x3                (0x8)
 kernel_toc                = 0xc000000001932500 (0x10)
 kernelbase                = 0xc000000000000000 (0x18)
 kernel_msr                = 0xb000000000001003 (0x20)
 emergency_sp              = c00000003ffc0000   (0x28)
 nmi_emergency_sp          = c00000003ffbc000   (0xa90)
 mc_emergency_sp           = c00000003ffb8000   (0xa98)
 in_nmi                    = 0x0                (0xaa0)
 in_mce                    = 0x0                (0xaa2)
 hmi_event_available       = 0x0                (0xaa4)
 data_offset               = 0x1fdaf0000        (0x30)
 hw_cpu_id                 = 0x0                (0x38)
 cpu_start                 = 0x0                (0x3a)
 kexec_state               = 0x0                (0x3b)
 vmalloc_sllp              = 0x0                (0x120)
 slb_cache_ptr             = 0x0                (0x122)
 slb_cache             [0] = 0x0000000000000000
 slb_cache             [1] = 0x0000000000000000
 slb_cache             [2] = 0x0000000000000000
 slb_cache             [3] = 0x0000000000000000
 slb_cache             [4] = 0x0000000000000000
 slb_cache             [5] = 0x0000000000000000
 slb_cache             [6] = 0x0000000000000000
 slb_cache             [7] = 0x0000000000000000
 rfi_flush_fallback_area   = c00000003ff60000   (0x11d0)
 dscr_default              = 0x0                (0x58)
 __current                 = c0000000018c4f80   (0x960)
 kstack                    = 0x0                (0x968)
 kstack_base               = 0x0000000000000000
 stab_rr                   = 0x0                (0x970)
 saved_r1                  = 0x0                (0x978)
 trap_save                 = 0x0                (0x988)
 irq_soft_mask             = 0x0                (0x98a)
 irq_happened              = 0x0                (0x98b)
 io_sync                   = 0x0                (0x98c)
 irq_work_pending          = 0x0                (0x98d)
 nap_state_lost            = 0x0                (0x98e)
 sprg_vdso                 = 0x0                (0x990)
 tm_scratch                = 0x0                (0x998)
 core_idle_state_ptr       = 0                  (0x9a0)
 thread_idle_state         = 0x0                (0x9a8)
 thread_mask               = 0x0                (0x9a9)
 subcore_sibling_mask      = 0x0                (0x9aa)
 requested_psscr           = 0x0                (0x9b0)
 stop_sprs.pid             = 0x0                (0x9b8)
 stop_sprs.ldbar           = 0x0                (0x9c0)
 stop_sprs.fscr            = 0x0                (0x9c8)
 stop_sprs.hfscr           = 0x0                (0x9d0)
 stop_sprs.mmcr1           = 0x0                (0x9d8)
 stop_sprs.mmcr2           = 0x0                (0x9e0)
 stop_sprs.mmcra           = 0x0                (0x9e8)
 dont_stop.counter         = 0x0                (0x9ac)
 accounting.utime          = 0x0                (0xaa8)
 accounting.stime          = 0x0                (0xab0)
 accounting.utime_scaled   = 0x0                (0xab8)
 accounting.starttime      = 0x0                (0xaf0)
 accounting.starttime_user = 0x0                (0xaf8)
 accounting.startspurr     = 0x0                (0xb00)
 accounting.utime_sspurr   = 0x0                (0xb08)
 accounting.steal_time     = 0x0                (0xae0)
paca for cpu 0x4 @ c00000003fffa600:
 possible                  = yes
 present                   = no
 online                    = no
 lock_token                = 0x8000             (0xa)
 paca_index                = 0x4                (0x8)
 kernel_toc                = 0xc000000001932500 (0x10)
 kernelbase                = 0xc000000000000000 (0x18)
 kernel_msr                = 0xb000000000001003 (0x20)
 emergency_sp              = c00000003ffb4000   (0x28)
 nmi_emergency_sp          = c00000003ffb0000   (0xa90)
 mc_emergency_sp           = c00000003ffac000   (0xa98)
 in_nmi                    = 0x0                (0xaa0)
 in_mce                    = 0x0                (0xaa2)
 hmi_event_available       = 0x0                (0xaa4)
 data_offset               = 0x1fdcf0000        (0x30)
 hw_cpu_id                 = 0x0                (0x38)
 cpu_start                 = 0x0                (0x3a)
 kexec_state               = 0x0                (0x3b)
 vmalloc_sllp              = 0x0                (0x120)
 slb_cache_ptr             = 0x0                (0x122)
 slb_cache             [0] = 0x0000000000000000
 slb_cache             [1] = 0x0000000000000000
 slb_cache             [2] = 0x0000000000000000
 slb_cache             [3] = 0x0000000000000000
 slb_cache             [4] = 0x0000000000000000
 slb_cache             [5] = 0x0000000000000000
 slb_cache             [6] = 0x0000000000000000
 slb_cache             [7] = 0x0000000000000000
 rfi_flush_fallback_area   = c00000003ff60000   (0x11d0)
 dscr_default              = 0x0                (0x58)
 __current                 = c0000000018c4f80   (0x960)
 kstack                    = 0x0                (0x968)
 kstack_base               = 0x0000000000000000
 stab_rr                   = 0x0                (0x970)
 saved_r1                  = 0x0                (0x978)
 trap_save                 = 0x0                (0x988)
 irq_soft_mask             = 0x0                (0x98a)
 irq_happened              = 0x0                (0x98b)
 io_sync                   = 0x0                (0x98c)
 irq_work_pending          = 0x0                (0x98d)
 nap_state_lost            = 0x0                (0x98e)
 sprg_vdso                 = 0x0                (0x990)
 tm_scratch                = 0x0                (0x998)
 core_idle_state_ptr       = 0                  (0x9a0)
 thread_idle_state         = 0x0                (0x9a8)
 thread_mask               = 0x0                (0x9a9)
 subcore_sibling_mask      = 0x0                (0x9aa)
 requested_psscr           = 0x0                (0x9b0)
 stop_sprs.pid             = 0x0                (0x9b8)
 stop_sprs.ldbar           = 0x0                (0x9c0)
 stop_sprs.fscr            = 0x0                (0x9c8)
 stop_sprs.hfscr           = 0x0                (0x9d0)
 stop_sprs.mmcr1           = 0x0                (0x9d8)
 stop_sprs.mmcr2           = 0x0                (0x9e0)
 stop_sprs.mmcra           = 0x0                (0x9e8)
 dont_stop.counter         = 0x0                (0x9ac)
 accounting.utime          = 0x0                (0xaa8)
 accounting.stime          = 0x0                (0xab0)
 accounting.utime_scaled   = 0x0                (0xab8)
 accounting.starttime      = 0x0                (0xaf0)
 accounting.starttime_user = 0x0                (0xaf8)
 accounting.startspurr     = 0x0                (0xb00)
 accounting.utime_sspurr   = 0x0                (0xb08)
 accounting.steal_time     = 0x0                (0xae0)
paca for cpu 0x5 @ c00000003fff8e00:
 possible                  = yes
 present                   = no
 online                    = no
 lock_token                = 0x8000             (0xa)
 paca_index                = 0x5                (0x8)
 kernel_toc                = 0xc000000001932500 (0x10)
 kernelbase                = 0xc000000000000000 (0x18)
 kernel_msr                = 0xb000000000001003 (0x20)
 emergency_sp              = c00000003ffa8000   (0x28)
 nmi_emergency_sp          = c00000003ffa4000   (0xa90)
 mc_emergency_sp           = c00000003ffa0000   (0xa98)
 in_nmi                    = 0x0                (0xaa0)
 in_mce                    = 0x0                (0xaa2)
 hmi_event_available       = 0x0                (0xaa4)
 data_offset               = 0x1fdef0000        (0x30)
 hw_cpu_id                 = 0x0                (0x38)
 cpu_start                 = 0x0                (0x3a)
 kexec_state               = 0x0                (0x3b)
 vmalloc_sllp              = 0x0                (0x120)
 slb_cache_ptr             = 0x0                (0x122)
 slb_cache             [0] = 0x0000000000000000
 slb_cache             [1] = 0x0000000000000000
 slb_cache             [2] = 0x0000000000000000
 slb_cache             [3] = 0x0000000000000000
 slb_cache             [4] = 0x0000000000000000
 slb_cache             [5] = 0x0000000000000000
 slb_cache             [6] = 0x0000000000000000
 slb_cache             [7] = 0x0000000000000000
 rfi_flush_fallback_area   = c00000003ff60000   (0x11d0)
 dscr_default              = 0x0                (0x58)
 __current                 = c0000000018c4f80   (0x960)
 kstack                    = 0x0                (0x968)
 kstack_base               = 0x0000000000000000
 stab_rr                   = 0x0                (0x970)
 saved_r1                  = 0x0                (0x978)
 trap_save                 = 0x0                (0x988)
 irq_soft_mask             = 0x0                (0x98a)
 irq_happened              = 0x0                (0x98b)
 io_sync                   = 0x0                (0x98c)
 irq_work_pending          = 0x0                (0x98d)
 nap_state_lost            = 0x0                (0x98e)
 sprg_vdso                 = 0x0                (0x990)
 tm_scratch                = 0x0                (0x998)
 core_idle_state_ptr       = 0                  (0x9a0)
 thread_idle_state         = 0x0                (0x9a8)
 thread_mask               = 0x0                (0x9a9)
 subcore_sibling_mask      = 0x0                (0x9aa)
 requested_psscr           = 0x0                (0x9b0)
 stop_sprs.pid             = 0x0                (0x9b8)
 stop_sprs.ldbar           = 0x0                (0x9c0)
 stop_sprs.fscr            = 0x0                (0x9c8)
 stop_sprs.hfscr           = 0x0                (0x9d0)
 stop_sprs.mmcr1           = 0x0                (0x9d8)
 stop_sprs.mmcr2           = 0x0                (0x9e0)
 stop_sprs.mmcra           = 0x0                (0x9e8)
 dont_stop.counter         = 0x0                (0x9ac)
 accounting.utime          = 0x0                (0xaa8)
 accounting.stime          = 0x0                (0xab0)
 accounting.utime_scaled   = 0x0                (0xab8)
 accounting.starttime      = 0x0                (0xaf0)
 accounting.starttime_user = 0x0                (0xaf8)
 accounting.startspurr     = 0x0                (0xb00)
 accounting.utime_sspurr   = 0x0                (0xb08)
 accounting.steal_time     = 0x0                (0xae0)
paca for cpu 0x6 @ c00000003fff7600:
 possible                  = yes
 present                   = no
 online                    = no
 lock_token                = 0x8000             (0xa)
 paca_index                = 0x6                (0x8)
 kernel_toc                = 0xc000000001932500 (0x10)
 kernelbase                = 0xc000000000000000 (0x18)
 kernel_msr                = 0xb000000000001003 (0x20)
 emergency_sp              = c00000003ff9c000   (0x28)
 nmi_emergency_sp          = c00000003ff98000   (0xa90)
 mc_emergency_sp           = c00000003ff94000   (0xa98)
 in_nmi                    = 0x0                (0xaa0)
 in_mce                    = 0x0                (0xaa2)
 hmi_event_available       = 0x0                (0xaa4)
 data_offset               = 0x1fe0f0000        (0x30)
 hw_cpu_id                 = 0x0                (0x38)
 cpu_start                 = 0x0                (0x3a)
 kexec_state               = 0x0                (0x3b)
 vmalloc_sllp              = 0x0                (0x120)
 slb_cache_ptr             = 0x0                (0x122)
 slb_cache             [0] = 0x0000000000000000
 slb_cache             [1] = 0x0000000000000000
 slb_cache             [2] = 0x0000000000000000
 slb_cache             [3] = 0x0000000000000000
 slb_cache             [4] = 0x0000000000000000
 slb_cache             [5] = 0x0000000000000000
 slb_cache             [6] = 0x0000000000000000
 slb_cache             [7] = 0x0000000000000000
 rfi_flush_fallback_area   = c00000003ff60000   (0x11d0)
 dscr_default              = 0x0                (0x58)
 __current                 = c0000000018c4f80   (0x960)
 kstack                    = 0x0                (0x968)
 kstack_base               = 0x0000000000000000
 stab_rr                   = 0x0                (0x970)
 saved_r1                  = 0x0                (0x978)
 trap_save                 = 0x0                (0x988)
 irq_soft_mask             = 0x0                (0x98a)
 irq_happened              = 0x0                (0x98b)
 io_sync                   = 0x0                (0x98c)
 irq_work_pending          = 0x0                (0x98d)
 nap_state_lost            = 0x0                (0x98e)
 sprg_vdso                 = 0x0                (0x990)
 tm_scratch                = 0x0                (0x998)
 core_idle_state_ptr       = 0                  (0x9a0)
 thread_idle_state         = 0x0                (0x9a8)
 thread_mask               = 0x0                (0x9a9)
 subcore_sibling_mask      = 0x0                (0x9aa)
 requested_psscr           = 0x0                (0x9b0)
 stop_sprs.pid             = 0x0                (0x9b8)
 stop_sprs.ldbar           = 0x0                (0x9c0)
 stop_sprs.fscr            = 0x0                (0x9c8)
 stop_sprs.hfscr           = 0x0                (0x9d0)
 stop_sprs.mmcr1           = 0x0                (0x9d8)
 stop_sprs.mmcr2           = 0x0                (0x9e0)
 stop_sprs.mmcra           = 0x0                (0x9e8)
 dont_stop.counter         = 0x0                (0x9ac)
 accounting.utime          = 0x0                (0xaa8)
 accounting.stime          = 0x0                (0xab0)
 accounting.utime_scaled   = 0x0                (0xab8)
 accounting.starttime      = 0x0                (0xaf0)
 accounting.starttime_user = 0x0                (0xaf8)
 accounting.startspurr     = 0x0                (0xb00)
 accounting.utime_sspurr   = 0x0                (0xb08)
 accounting.steal_time     = 0x0                (0xae0)
paca for cpu 0x7 @ c00000003fff5e00:
 possible                  = yes
 present                   = no
 online                    = no
 lock_token                = 0x8000             (0xa)
 paca_index                = 0x7                (0x8)
 kernel_toc                = 0xc000000001932500 (0x10)
 kernelbase                = 0xc000000000000000 (0x18)
 kernel_msr                = 0xb000000000001003 (0x20)
 emergency_sp              = c00000003ff90000   (0x28)
 nmi_emergency_sp          = c00000003ff8c000   (0xa90)
 mc_emergency_sp           = c00000003ff88000   (0xa98)
 in_nmi                    = 0x0                (0xaa0)
 in_mce                    = 0x0                (0xaa2)
 hmi_event_available       = 0x0                (0xaa4)
 data_offset               = 0x1fe2f0000        (0x30)
 hw_cpu_id                 = 0x0                (0x38)
 cpu_start                 = 0x0                (0x3a)
 kexec_state               = 0x0                (0x3b)
 vmalloc_sllp              = 0x0                (0x120)
 slb_cache_ptr             = 0x0                (0x122)
 slb_cache             [0] = 0x0000000000000000
 slb_cache             [1] = 0x0000000000000000
 slb_cache             [2] = 0x0000000000000000
 slb_cache             [3] = 0x0000000000000000
 slb_cache             [4] = 0x0000000000000000
 slb_cache             [5] = 0x0000000000000000
 slb_cache             [6] = 0x0000000000000000
 slb_cache             [7] = 0x0000000000000000
 rfi_flush_fallback_area   = c00000003ff60000   (0x11d0)
 dscr_default              = 0x0                (0x58)
 __current                 = c0000000018c4f80   (0x960)
 kstack                    = 0x0                (0x968)
 kstack_base               = 0x0000000000000000
 stab_rr                   = 0x0                (0x970)
 saved_r1                  = 0x0                (0x978)
 trap_save                 = 0x0                (0x988)
 irq_soft_mask             = 0x0                (0x98a)
 irq_happened              = 0x0                (0x98b)
 io_sync                   = 0x0                (0x98c)
 irq_work_pending          = 0x0                (0x98d)
 nap_state_lost            = 0x0                (0x98e)
 sprg_vdso                 = 0x0                (0x990)
 tm_scratch                = 0x0                (0x998)
 core_idle_state_ptr       = 0                  (0x9a0)
 thread_idle_state         = 0x0                (0x9a8)
 thread_mask               = 0x0                (0x9a9)
 subcore_sibling_mask      = 0x0                (0x9aa)
 requested_psscr           = 0x0                (0x9b0)
 stop_sprs.pid             = 0x0                (0x9b8)
 stop_sprs.ldbar           = 0x0                (0x9c0)
 stop_sprs.fscr            = 0x0                (0x9c8)
 stop_sprs.hfscr           = 0x0                (0x9d0)
 stop_sprs.mmcr1           = 0x0                (0x9d8)
 stop_sprs.mmcr2           = 0x0                (0x9e0)
 stop_sprs.mmcra           = 0x0                (0x9e8)
 dont_stop.counter         = 0x0                (0x9ac)
 accounting.utime          = 0x0                (0xaa8)
 accounting.stime          = 0x0                (0xab0)
 accounting.utime_scaled   = 0x0                (0xab8)
 accounting.starttime      = 0x0                (0xaf0)
 accounting.starttime_user = 0x0                (0xaf8)
 accounting.startspurr     = 0x0                (0xb00)
 accounting.utime_sspurr   = 0x0                (0xb08)
 accounting.steal_time     = 0x0                (0xae0)

qemu commandline:

/usr/share/avocado-plugins-vt/build/qemu/ppc64-softmmu/qemu-system-ppc64 -name guest=avocado-vt-vm1,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-11-avocado-vt-vm1/master-key.aes -machine pseries-3.1,accel=kvm,usb=off,dump-guest-core=off -m 8192 -realtime mlock=off -smp 2,maxcpus=8,sockets=4,cores=2,threads=1 -numa node,nodeid=0,cpus=0-1,mem=0 -numa node,nodeid=1,cpus=2-3,mem=0 -numa node,nodeid=2,cpus=4-5,mem=8192 -numa node,nodeid=3,cpus=6-7,mem=0 -uuid 6865aa45-af51-4d99-9aa3-99e6e366262e -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=30,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -kernel /home/kvmci/linux/vmlinux -append root=/dev/sda2 rw console=tty0 console=ttyS0,115200 init=/sbin/init  initcall_debug selinux=0 xmon=on -device qemu-xhci,id=usb,bus=pci.0,addr=0x3 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/avocado/data/avocado-vt/images/jeos-27-ppc64le.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0 -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -netdev tap,fd=32,id=hostnet0,vhost=on,vhostfd=33 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:f7:f8:f9,bus=pci.0,addr=0x1 -chardev pty,id=charserial0 -device spapr-vty,chardev=charserial0,id=serial0,reg=0x30000000 -chardev socket,id=charchannel0,fd=34,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on

vm1.txt

sathnaga commented 6 years ago
2:mon> x
[ 5261.809214] Unable to handle kernel paging request for data at address 0x00001c08
[ 5261.809274] WARNING: timekeeping: Cycle offset (2682080198472) is larger than allowed by the 'timebase' clock's max_cycles value (507162120199): time overflow danger
[ 5261.809283]          timekeeping: Your kernel is sick, but tries to cope by capping time updates
[ 5261.815616] Faulting instruction address: 0xc000000000350840
cpu 0x2: Vector: 380 (Data SLB Access) at [c0000001fb237c70]
    pc: c000000000350840: local_memory_node+0x20/0x80
    lr: c00000000005242c: start_secondary+0x47c/0x530
    sp: c0000001fb237ef0
   msr: 8000000000001033
   dar: 1c08
  current = 0xc0000001fb1bca00
  paca    = 0xc00000003fffce00   irqmask: 0x03   irq_happened: 0x01
    pid   = 0, comm = swapper/2
Linux version 4.18.0-g6e61beb7 (root@9.40.192.86) (gcc version 8.1.1 20180712 (Red Hat 8.1.1-5) (GCC)) #2 SMP Thu Aug 30 04:23:21 EDT 2018
enter ? for help
[link register   ] c00000000005242c start_secondary+0x47c/0x530
[c0000001fb237ef0] c00000000005235c start_secondary+0x3ac/0x530 (unreliable)
[c0000001fb237f90] c00000000000b270 start_secondary_prolog+0x10/0x14
sathnaga commented 6 years ago
2:mon> X
[ 5306.273092] Oops: Kernel access of bad area, sig: 11 [#1]
[ 5306.274285] LE SMP NR_CPUS=1024 NUMA pSeries
[ 5306.275166] Modules linked in:
[ 5306.275766] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.18.0-g6e61beb7 #2
[ 5306.277119] NIP:  c000000000350840 LR: c00000000005242c CTR: 0000000000000000
[ 5306.278513] REGS: c0000001fb237c70 TRAP: 0380   Not tainted  (4.18.0-g6e61beb7)
[ 5306.279950] MSR:  8000000000001033 <SF,ME,IR,DR,RI,LE>  CR: 22000824  XER: 20000000
[ 5306.281469] CFAR: c00000000000e5d4 IRQMASK: 1 
[ 5306.281469] GPR00: c00000000005242c c0000001fb237ef0 c000000001932500 0000000000000000 
[ 5306.281469] GPR04: c0000001feb05480 0000000000000000 0000000000000008 0000000000000010 
[ 5306.281469] GPR08: 0000000000000018 c000000001aeb170 0000000000000001 0000000000000000 
[ 5306.281469] GPR12: 0000000000000000 c00000003fffce00 c0000001fb237f90 0000000000000000 
[ 5306.281469] GPR16: 0000000000000000 0000000000000000 c00000000004ff50 c0000000019641a4 
[ 5306.281469] GPR20: c000000001215500 0000000000000400 0000000000000001 0000000000000002 
[ 5306.281469] GPR24: 0000000000000000 0000000000000008 c000000001961d70 c000000001ae0058 
[ 5306.281469] GPR28: 0000000000000004 0000000000000001 c000000001963e70 c0000000013f32c8 
[ 5306.290986] Unable to handle kernel paging request for data at address 0x00009c78
[ 5306.294978] NIP [c000000000350840] local_memory_node+0x20/0x80
[ 5306.294981] LR [c00000000005242c] start_secondary+0x47c/0x530
[c 5p3u0 60.x12:9 6Ve5c0t6o]r:  F3a0u0l t(iDng iantas tArcuccteisosn)  aadtd [rce0s0s0:0 000x1fc4063070804000]
 c 0 3 f 1p8cfc:
o 0[ 005030060.0020937f61585fc] :C a_l_l_ sTlarba_cael:l
3c[+ 05x380c6/.03x0a10501
 ]   [ c l0r0:0 0c00000100f0b0203073eff20]2 a[4c: 0_0_0s0la0b0_0a0l0l0o5c23+50cx]3 4s/teaxrt6_0s
ic  o n dasrpy:+ 0cx030a0c/000x051f3406 3(u7narce0l
sa b le )m
0r[:  583000060.03000401080009]0 3[3c
00 0  0d0a0r1:f b92c377f89
b]  d[scis0r0:0 0400000000000000
/2  7c0u] rsrteanrtt _=se c0oxncd0a0r00y0_0p1reoel4ogd+e05x0100
c 0 xp14a
sa[    5 3=0 60.xc300060605060] 0I3nfsftfreucet0i0o n  idurmqpm:a
0k[:  503x0063. 3 0i7r6q0_4h]ap p4een8e0d0:02 00 x60010
  0 0 0 0p i6d 0 0 0= 01000570,  6c0o0m0m00 0=0  (3tcm4pfci0l1e5se) 
bL8i4n2u1xc ev0e r7sci0o8n 042.a168 .6000-0g060e0601 
 e[b 75 3(0ro6o.t3@190.14660]. 139d22.2860)0 1(cg c7c86 3v1efr2s4i 3o9n2 988.c17.10  270c1689018721a2  <(8R1ed2 3H1atc 088>. 13.816-351) c(0G0C C2)b) 8#920 0S0M2P 4 1T9hdu0 001ug4  3
  [0 453:0263.:321127 4E8D]T  -2-01-8[
 encpu 0x2: Vector: 100 (System Reset) at [c00000003ffc7d80]
    pc: c0000000000cd81c: plpar_hcall_norets+0x1c/0x28
    lr: c0000000000de5bc: hvc_put_chars+0x4c/0xb0
    sp: c0000001fb237760
   msr: 8000000000001033
  current = 0xc0000001fb1bca00
  paca    = 0xc00000003fffce00   irqmask: 0x03   irq_happened: 0x01
    pid   = 0, comm = swapper/2
Linux version 4.18.0-g6e61beb7 (root@9.40.192.86) (gcc version 8.1.1 20180712 (Red Hat 8.1.1-5) (GCC)) #2 SMP Thu Aug 30 04:23:21 EDT 2018
enter ? for help
[link register   ] c0000000003f22a4 __slab_alloc+0x34/0x60
[c0000001f4637ac0] c000000001963e70 __cpu_online_mask+0x0/0x80 (unreliable)
[c0000001f4637bc0] c0000001f4637bf0
[c0000001f4637bf0] c0000000003f2fb4 kmem_cache_alloc_node_trace+0x1a4/0x370
[c0000001f4637c60] c0000000001ad10c alloc_fair_sched_group+0x11c/0x280
[c0000001f4637d00] c000000000196890 sched_create_group+0x50/0xf0
[c0000001f4637d30] c0000000001c10dc sched_autogroup_create_attach+0x6c/0x200
[c0000001f4637dc0] c00000000016ca04 ksys_setsid+0x144/0x190
[c0000001f4637e10] c00000000016ca70 sys_setsid+0x20/0x30
[c0000001f4637e30] c00000000000b9e4 system_call+0x5c/0x70
--- Exception: c00 (System Call) at 00007fffb0c69010
SP (7fffd41267a0) is in userspace
1:mon> 
sathnaga commented 6 years ago

git bisect yielded the below commit as issue ea05ba7c559c8e5a5946c3a94a2a266e9a6680a6 is the first bad commit

commit ea05ba7c559c8e5a5946c3a94a2a266e9a6680a6
Author: Michael Bringmann <mwb@linux.vnet.ibm.com>
Date:   Tue Nov 28 16:58:40 2017 -0600

    powerpc/numa: Ensure nodes initialized for hotplug

    This patch fixes some problems encountered at runtime with
    configurations that support memory-less nodes, or that hot-add CPUs
    into nodes that are memoryless during system execution after boot. The
    problems of interest include:

    * Nodes known to powerpc to be memoryless at boot, but to have CPUs in
      them are allowed to be 'possible' and 'online'. Memory allocations
      for those nodes are taken from another node that does have memory
      until and if memory is hot-added to the node.

    * Nodes which have no resources assigned at boot, but which may still
      be referenced subsequently by affinity or associativity attributes,
      are kept in the list of 'possible' nodes for powerpc. Hot-add of
      memory or CPUs to the system can reference these nodes and bring
      them online instead of redirecting the references to one of the set
      of nodes known to have memory at boot.

    Note that this software operates under the context of CPU hotplug. We
    are not doing memory hotplug in this code, but rather updating the
    kernel's CPU topology (i.e. arch_update_cpu_topology /
    numa_update_cpu_topology). We are initializing a node that may be used
    by CPUs or memory before it can be referenced as invalid by a CPU
    hotplug operation. CPU hotplug operations are protected by a range of
    APIs including cpu_maps_update_begin/cpu_maps_update_done,
    cpus_read/write_lock / cpus_read/write_unlock, device locks, and more.
    Memory hotplug operations, including try_online_node, are protected by
    mem_hotplug_begin/mem_hotplug_done, device locks, and more. In the
    case of CPUs being hot-added to a previously memoryless node, the
    try_online_node operation occurs wholly within the CPU locks with no
    overlap. Using HMC hot-add/hot-remove operations, we have been able to
    add and remove CPUs to any possible node without failures. HMC
    operations involve a degree self-serialization, though.

    Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
    Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

:040000 040000 0017d62b388ff1c70ef64f8aba5697151d4b824a e4bc24ea9158594d1992c4eca28b3cb7a0b61d27 M     arch
sathnaga commented 5 years ago

Updates with latest kernel and qemu bits... VM is not crashing this time but hits a kernel bug with different call trace.

Env: HW: IBM Power8 Host kernel: 4.20.0-rc6-g56d7e379c qemu: version 3.1.50 (v2.8.0-rc0-16056-g8e1ac6cb1d-dirty) [commit 8e1ac6cb1d7e33a0594afc7fa1105cbce40f45fe (HEAD -> ppc-for-4.0)] Guest Kernel: 5.0.0-rc1-g3bd6e94be

  1. Define and boot the guest as described in initial comment[refer vm1.txt attachment]. [root@atest-guest ~]# lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 2 On-line CPU(s) list: 0,1 Thread(s) per core: 1 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 2 Model: 2.1 (pvr 004b 0201) Model name: POWER8 (architected), altivec supported Hypervisor vendor: KVM Virtualization type: para L1d cache: 64K L1i cache: 32K NUMA node0 CPU(s): 0,1 NUMA node1 CPU(s):

2.Hotplug vcpus: $ virsh setvcpus vm1 8 --live kernel bug on guest, guest continue to be operational though.

# [   59.691058] WARNING: workqueue cpumask: online intersect > possible intersect
[   59.733903] root domain span: 0-2 (max cpu_capacity = 1024)
[   59.950896] Built 2 zonelists, mobility grouping on.  Total pages: 130146
[   59.954125] Policy zone: Normal
[   59.955916] root domain span: 0-3 (max cpu_capacity = 1024)
[   60.071389] BUG: Kernel NULL pointer dereference at 0x00000400
[   60.074974] Faulting instruction address: 0xc00000000017966c
[   60.076687] Oops: Kernel access of bad area, sig: 11 [#1]
[   60.078305] LE SMP NR_CPUS=2048 NUMA pSeries
[   60.079598] Modules linked in:
[   60.080516] CPU: 4 PID: 3024 Comm: kworker/4:0 Not tainted 5.0.0-rc1-g3bd6e94be #3
[   60.082836] Workqueue: events cpuset_hotplug_workfn
[   60.084309] NIP:  c00000000017966c LR: c000000000179738 CTR: 0000000000000000
[   60.086455] REGS: c0000001f86a7130 TRAP: 0380   Not tainted  (5.0.0-rc1-g3bd6e94be)
[   60.088756] MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 22824424  XER: 00000000
[   60.091140] CFAR: c0000000001796b0 IRQMASK: 0 
[   60.091140] GPR00: c000000000179738 c0000001f86a73c0 c0000000014c9a00 c0000001f9ff2c00 
[   60.091140] GPR04: 0000000000000001 0000000000000000 0000000000000008 0000000000000010 
[   60.091140] GPR08: 0000000000000018 ffffffffffffffff 0000000000000400 0000000000000000 
[   60.091140] GPR12: 0000000000008800 c00000003fffb300 c000000001506104 0000000000000800 
[   60.091140] GPR16: c0000001f9d60000 c000000000efc048 000000000000102f ffffffffffffe830 
[   60.091140] GPR20: ffffffffffffec30 000000000000102f c0000001fa4db800 c0000001fa4dd800 
[   60.091140] GPR24: c0000001e6200000 00000001fe4c0000 0000000000000001 c0000000010a8080 
[   60.091140] GPR28: c0000001fa4d3400 c0000001f9ff2c00 c0000001f9ff6bff c0000001f9ff3e00 
[   60.104265] NIP [c00000000017966c] free_sched_groups.part.2+0x5c/0xf0
[   60.105448] LR [c000000000179738] destroy_sched_domain+0x38/0xc0
[   60.106553] Call Trace:
[   60.107004] [c0000001f86a73c0] [0000000000000001] 0x1 (unreliable)
[   60.108140] [c0000001f86a7400] [c000000000179738] destroy_sched_domain+0x38/0xc0
[   60.109502] [c0000001f86a7430] [c000000000179b1c] cpu_attach_domain+0xfc/0x940
[   60.110834] [c0000001f86a7570] [c00000000017b624] build_sched_domains+0x12c4/0x13d0
[   60.112243] [c0000001f86a76b0] [c00000000017c7f4] partition_sched_domains+0x254/0x3d4
[   60.113673] [c0000001f86a7740] [c000000000201db0] rebuild_sched_domains_locked+0x400/0x700
[   60.115144] [c0000001f86a7830] [c000000000206888] rebuild_sched_domains+0x38/0x60
[   60.116477] [c0000001f86a7860] [c000000000206c04] cpuset_hotplug_workfn+0x354/0xde0
[   60.117837] [c0000001f86a7c80] [c000000000138d60] process_one_work+0x2b0/0x560
[   60.119124] [c0000001f86a7d10] [c000000000139098] worker_thread+0x88/0x610
[   60.120350] [c0000001f86a7db0] [c00000000014203c] kthread+0x1ac/0x1c0
[   60.121499] [c0000001f86a7e20] [c00000000000bdd4] ret_from_kernel_thread+0x5c/0x68
[   60.122849] Instruction dump:
[   60.123381] 91810008 2e240000 f8010010 f821ffc1 48000010 7fbee840 7fdff378 419e0074 
[   60.124761] ebdf0000 4192002c e95f0010 7c0004ac <7d205028> 3129ffff 7d20512d 40c2fff4 
[   60.126177] ---[ end trace 1632a73375cd2dbb ]---
[   60.131793] 
[   60.132087] kworker/4:0 (3024) used greatest stack depth: 9024 bytes left

[root@atest-guest ~]# lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 6 On-line CPU(s) list: 0-4 Off-line CPU(s) list: 5 Thread(s) per core: 1 Core(s) per socket: 5 Socket(s): 1 NUMA node(s): 3 Model: 2.1 (pvr 004b 0201) Model name: POWER8 (architected), altivec supported Hypervisor vendor: KVM Virtualization type: para L1d cache: 64K L1i cache: 32K NUMA node0 CPU(s): 0,1 NUMA node1 CPU(s): 2,3 NUMA node2 CPU(s): 4

Noticed that x86_64(intel) guest on x86_64 host with above configuration crashes different call trace though and documented observation here, https://bugzilla.kernel.org/show_bug.cgi?id=202187