open-power-host-os / qemu

OpenPOWER Host OS qemu repository
Other
2 stars 3 forks source link

VM fails to start with > 128 vcpus cores in power9. - "qemu-system-ppc64: kvm_init_vcpu failed: Invalid argument" #38

Closed sathnaga closed 6 years ago

sathnaga commented 6 years ago
Mirrored with LTC bug https://bugzilla.linux.ibm.com/show_bug.cgi?id=166001 KVM guest(qemu) crashed while hotplug of large number of vcpus, reproducible with upstream qemu aswell, https://github.com/qemu/qemu/commit/99728ba3ec9b8795ff7191ea75a2a8c0329c29a5 Env: ``` Host: Power9 2.2 (pvr 004e 1202) qemu-2.11.50-3.dev.git59cc362.el7.centos.ppc64le 4.15.0-5.dev.git33f711f.el7.centos.ppc64le libvirt-4.0.0-2.dev.git5e6f8a1.el7.centos.ppc64le Guest:4.15.0-5.dev.git33f711f.el7.centos.ppc64le, rhel7.5(3.10~) ``` Steps to reproduce: ``` 1. Boot a vm with 1 current vcpu and 240 maxvcpus ` 240` # lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 1 On-line CPU(s) list: 0 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 1 NUMA node(s): 1 Model: 2.2 (pvr 004e 1202) Model name: POWER9 (architected), altivec supported Hypervisor vendor: (null) Virtualization type: full L1d cache: 32K L1i cache: 32K NUMA node0 CPU(s): 0 2. virsh setvcpus vm2 240 --live ---guest crashed with qemu error `kvm_init_vcpu failed: Invalid argument` ``` Qemu cmdline ``` LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /home/sath/qemu/ppc64-softmmu/qemu-system-ppc64 -name guest=vm2,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-5-vm2/master-key.aes -machine pseries-2.12,accel=kvm,usb=off,dump-guest-core=off -m 32768 -mem-prealloc -mem-path /dev/hugepages/libvirt/qemu/5-vm2 -realtime mlock=off -smp 1,maxcpus=240,sockets=1,cores=240,threads=1 -uuid 46114216-4745-452b-b4df-57fd289a10b6 -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-5-vm2/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device qemu-xhci,id=usb,bus=pci.0,addr=0x3 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/home/sath/images/hostos-ppc64le.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0 -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:3a:3b:3d,bus=pci.0,addr=0x1 -chardev pty,id=charserial0 -device spapr-vty,chardev=charserial0,id=serial0,reg=0x30000000 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-5-vm2/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on 2018-03-23T16:30:19.348016Z qemu-system-ppc64: -chardev pty,id=charserial0: char device redirected to /dev/pts/2 (label charserial0) 2018-03-23T16:30:19.348639Z qemu-system-ppc64: warning: Number of hotpluggable cpus requested (240) exceeds the recommended cpus supported by KVM (128) 2018-03-23T16:30:21.628272Z qemu-system-ppc64: warning: System page size 0x200000 is not enabled in page_size_mask (0x11000). Performance may be slow kvm_init_vcpu failed: Invalid argument 2018-03-23 16:31:09.529+0000: shutting down, reason=crashed 2018-03-23 16:32:16.238+0000: starting up libvirt version: 4.0.0, package: 2.dev.git5e6f8a1.el7.centos (Unknown, 2018-01-21-15:28:09, baratheon), qemu version: 2.11.50, hostname: ltc-wspoon5 ```
sathnaga commented 6 years ago

VM fails to start with > 128 vcpus cores in power9.

Host Env:

# lscpu
Architecture:          ppc64le
Byte Order:            Little Endian
CPU(s):                160
On-line CPU(s) list:   0-159
Thread(s) per core:    4
Core(s) per socket:    20
Socket(s):             2
NUMA node(s):          2
Model:                 2.2 (pvr 004e 1202)
Model name:            POWER9, altivec supported
CPU max MHz:           3900.0000
CPU min MHz:           2300.0000
L1d cache:             32K
L1i cache:             32K
L2 cache:              512K
L3 cache:              10240K
NUMA node0 CPU(s):     0-79
NUMA node8 CPU(s):     80-159

uname -a
Linux x.x.x.x 4.16.0-3.dev.gitfd8742e.el7.centos.p.ppc64le #1 SMP Tue May 15 08:07:09 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux

qemu-system-ppc-2.12.0-2.dev.gitd36f3ee.el7.centos.p.ppc64le
libvirt-4.3.0-1.dev.git3096ff1.el7.centos.p.ppc64le
/usr/bin/virt-install --connect=qemu:///system --hvm --accelerate --name 'virt-tests-vm1' --machine pseries --memory=10240  --import --nographics --serial pty --memballoon model=virtio --controller type=scsi,model=virtio-scsi --disk path=/home/sath/avocado-fvt-wrapper/data/avocado-vt/images/hostos-ppc64le.qcow2,bus=scsi,size=10,format=qcow2 --network=bridge=virbr0,model=virtio,mac=52:54:00:49:4a:4b --noautoconsole --vcpu=129,sockets=1,cores=129,threads=1

WARNING  No operating system detected, VM performance may suffer. Specify an OS with --os-variant for optimal results.

Starting install...
ERROR    internal error: qemu unexpectedly closed the monitor: stnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:49:4a:4b,bus=pci.0,addr=0x1 -chardev pty,id=charserial0 -device spapr-vty,chardev=charserial0,id=serial0,reg=0x30000000 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
2018-05-16T08:06:21.518770Z qemu-system-ppc64: kvm_init_vcpu failed: Invalid argument
Domain installation does not appear to have been successful.
If it was, you can restart your domain by running:
  virsh --connect qemu:///system start virt-tests-vm1
otherwise, please restart your installation.
sathnaga commented 6 years ago

Root Cause looks to be host config CONFIG_NR_CPUS=1024, changing it CONFIG_NR_CPUS=2048 and recompile, boot host kernel helped solving the issue.

I am able to create VMs upto cores=256 now.

/usr/bin/virt-install --connect=qemu:///system --hvm --accelerate --name 'virt-tests-vm1' --machine pseries --memory=10240  --import --nographics --serial pty --memballoon model=virtio --controller type=scsi,model=virtio-scsi --disk path=/home/sath/avocado-fvt-wrapper/data/avocado-vt/images/hostos-ppc64le.qcow2,bus=scsi,size=10,format=qcow2 --network=bridge=virbr0,model=virtio,mac=52:54:00:49:4a:4b --noautoconsole --vcpu=1024,sockets=1,cores=256,threads=4
Domain virt-tests-vm1 destroyed

Domain virt-tests-vm1 has been undefined

WARNING  No operating system detected, VM performance may suffer. Specify an OS with --os-variant for optimal results.

Starting install...
Domain creation completed.
cdeadmin commented 6 years ago

------- Comment From seg@us.ibm.com 2018-05-16 08:46:14 EDT------- Let's change the kernel config for hostos builds.

farosas commented 6 years ago

Done. Just merged to devel branch.

sathnaga commented 6 years ago
# virsh dumpxml virt-tests-vm1
<domain type='kvm' id='6'>
  <name>virt-tests-vm1</name>
  <uuid>39c14bc2-7a41-4fa6-a6ec-57fe89d09863</uuid>
  <memory unit='KiB'>16777216</memory>
  <currentMemory unit='KiB'>16777216</currentMemory>
  <vcpu placement='static'>1024</vcpu>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='ppc64le' machine='pseries-2.12'>hvm</type>
    <boot dev='hd'/>
  </os>
  <cpu>
    <topology sockets='1' cores='256' threads='4'/>
    <numa>
      <cell id='0' cpus='0-511' memory='8388608' unit='KiB'/>
      <cell id='1' cpus='512-1023' memory='8388608' unit='KiB'/>
    </numa>
  </cpu>
....

# uname -a
Linux localhost.localdomain 4.16.0-4.dev.gitfd8742e.el7.ppc64le #1 SMP Mon May 21 12:19:50 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux
[root@localhost ~]# lscpu
Architecture:          ppc64le
Byte Order:            Little Endian
CPU(s):                1024
On-line CPU(s) list:   0-1023
Thread(s) per core:    4
Core(s) per socket:    256
Socket(s):             1
NUMA node(s):          2
Model:                 2.2 (pvr 004e 1202)
Model name:            POWER9 (architected), altivec supported
Hypervisor vendor:     KVM
Virtualization type:   para
L1d cache:             32K
L1i cache:             32K
NUMA node0 CPU(s):     0-511
NUMA node1 CPU(s):     512-1023
sathnaga commented 6 years ago

Fixed in latest devel builds.