Closed nasastry closed 5 years ago
------- Comment From KURZGREG@fr.ibm.com 2018-03-07 05:30:32 EDT-------
(In reply to comment #1)
> Steps to re-produce:
> 1. Start the guest with the following qemu command line options:
> > /usr/bin/qemu-system-ppc64 --nographic -vga none -machine > pseries-2.11,accel=kvm,kvm-type=HV -m 2G,slots=32,maxmem=16G -device > virtio-blk-pci,drive=rootdisk -drive > file=/home/nasastry/hostos-3.0-ppc64le.qcow2,if=none,cache=none,id=rootdisk, > format=qcow2 -net nic,model=virtio -monitor > telnet:127.0.0.1:1234,server,nowait >
This command line doesn't provide a net backend, which isn't a supported setup.
QEMU starts anyway but it prints a warning:
qemu-system-ppc64: warning: vlan 0 is not connected to host network
I could reproduce the problem though: savevm succeeds after first boot but it fails with "Virtqueue size exceeded" after reboot. This seems to indicate that the reboot breaks some assumption in the virtio-net code, so I guess it's worth investigating.
> 2. On host connect to 1234
port and do the memory hotplug, unplug, reboot
> the guest, savevm and loadvm
> ```
> # telnet localhost 1234
> Connected to localhost.
> Escape character is '^]'.
> QEMU 2.11.0 monitor - type 'help' for more information
> (qemu) object_add memory-backend-ram,id=ram0,size=1G
> (qemu) device_add pc-dimm,id=dimm0,memdev=ram0
> (qemu) device_del dimm0
The above steps aren't needed actually. Just system_reset.
> # Before running savevm reboot the guest. After the guest boots > (qemu) savevm 1 > Virtqueue size exceeded
The above error happens while flushing the TX queue just before snapshot.
In virtqueue_pop():
if (vq->inuse >= vq->vring.num) {
virtio_error(vdev, "Virtqueue size exceeded");
goto done;
}
(gdb) p/x vq->inuse $5 = 0xffffffff
This is clearly wrong.
This seem to be the result of a previous call to virtqueue_push() with vq->inuse == 0, which is also wrong.
At this point, the device is broken (vdev->broken is true) and should be reset to work again. This is recorded in the snapshot.
> (qemu) loadvm 1 > VQ 1 size 0x100 < last_avail_idx 0x0 - used_idx 0x1 > Failed to load virtio-net:virtio > error while loading state for instance 0x0 of device > 'pci@800000020000000:00.0/virtio-net' > Error -1 while loading VM state > ```
Since the device is broken and is supposed to be reset, I'm not sure it is appropriate for loadvm to fail actually.
> > Guest kernel: 4.14.0-3.git68b4afb.el7.centos.ppc64le > HostKernel: 4.14.0-1.rel.git68b4afb.el7.centos.ppc64le > Qemu: qemu-2.11.0-1.rel.gite7153e0.el7.centos.ppc64le > </cde:init_github_desc>
------- Comment From nasastry@in.ibm.com 2018-03-07 23:58:53 EDT------- Tested the patch from qemu-devel with subject: virtio_net: flush uncompleted TX on reset
No errors are seen now.
With out patch: (qemu) system_reset (qemu) savevm 1 Virtqueue size exceeded (qemu) loadvm 1 VQ 1 size 0x100 < last_avail_idx 0x0 - used_idx 0x1 Failed to load virtio-net:virtio error while loading state for instance 0x0 of device 'pci@800000020000000:00.0/virtio-net' Error -1 while loading VM state
With patch: (qemu) system_reset (qemu) savevm 1 (qemu) loadvm 1
Thanks!!
------- Comment From KURZGREG@fr.ibm.com 2018-03-30 03:46:55 EDT------- The fix is now upstream.
https://git.qemu.org/?p=qemu.git;a=commit;h=94b52958b77a2a040564cf7ed716d3a9545d94e5
It will be shipped with QEMU 2.12.