open-power-host-os / qemu

OpenPOWER Host OS qemu repository
Other
2 stars 3 forks source link

[qemu] loadvm fails with "Failed to load virtio-net:virtio" #37

Closed nasastry closed 5 years ago

nasastry commented 6 years ago
Mirrored with LTC bug https://bugzilla.linux.ibm.com/show_bug.cgi?id=165414 Description: loadvm fails with the following error: ``` VQ 1 size 0x100 < last_avail_idx 0x0 - used_idx 0x1 Failed to load virtio-net:virtio error while loading state for instance 0x0 of device 'pci@800000020000000:00.0/virtio-net' Error -1 while loading VM state ``` Steps to re-produce: 1. Start the guest with the following qemu command line options: ``` /usr/bin/qemu-system-ppc64 --nographic -vga none -machine pseries-2.11,accel=kvm,kvm-type=HV -m 2G,slots=32,maxmem=16G -device virtio-blk-pci,drive=rootdisk -drive file=/home/nasastry/hostos-3.0-ppc64le.qcow2,if=none,cache=none,id=rootdisk,format=qcow2 -net nic,model=virtio -monitor telnet:127.0.0.1:1234,server,nowait ``` 2. On host connect to `1234` port and do the memory hotplug, unplug, reboot the guest, savevm and loadvm ``` # telnet localhost 1234 Connected to localhost. Escape character is '^]'. QEMU 2.11.0 monitor - type 'help' for more information (qemu) object_add memory-backend-ram,id=ram0,size=1G (qemu) device_add pc-dimm,id=dimm0,memdev=ram0 (qemu) device_del dimm0 # Before running savevm reboot the guest. After the guest boots (qemu) savevm 1 Virtqueue size exceeded (qemu) loadvm 1 VQ 1 size 0x100 < last_avail_idx 0x0 - used_idx 0x1 Failed to load virtio-net:virtio error while loading state for instance 0x0 of device 'pci@800000020000000:00.0/virtio-net' Error -1 while loading VM state ``` Guest kernel: 4.14.0-3.git68b4afb.el7.centos.ppc64le HostKernel: 4.14.0-1.rel.git68b4afb.el7.centos.ppc64le Qemu: qemu-2.11.0-1.rel.gite7153e0.el7.centos.ppc64le
cdeadmin commented 6 years ago

------- Comment From KURZGREG@fr.ibm.com 2018-03-07 05:30:32 EDT------- (In reply to comment #1) > Steps to re-produce: > 1. Start the guest with the following qemu command line options: > &gt; /usr/bin/qemu-system-ppc64 --nographic -vga none -machine &gt; pseries-2.11,accel=kvm,kvm-type=HV -m 2G,slots=32,maxmem=16G -device &gt; virtio-blk-pci,drive=rootdisk -drive &gt; file=/home/nasastry/hostos-3.0-ppc64le.qcow2,if=none,cache=none,id=rootdisk, &gt; format=qcow2 -net nic,model=virtio -monitor &gt; telnet:127.0.0.1:1234,server,nowait &gt;

This command line doesn't provide a net backend, which isn't a supported setup.

QEMU starts anyway but it prints a warning:

qemu-system-ppc64: warning: vlan 0 is not connected to host network

I could reproduce the problem though: savevm succeeds after first boot but it fails with "Virtqueue size exceeded" after reboot. This seems to indicate that the reboot breaks some assumption in the virtio-net code, so I guess it's worth investigating.

> 2. On host connect to 1234 port and do the memory hotplug, unplug, reboot > the guest, savevm and loadvm > ``` > # telnet localhost 1234 > Connected to localhost. > Escape character is '^]'. > QEMU 2.11.0 monitor - type 'help' for more information > (qemu) object_add memory-backend-ram,id=ram0,size=1G > (qemu) device_add pc-dimm,id=dimm0,memdev=ram0 > (qemu) device_del dimm0

The above steps aren't needed actually. Just system_reset.

> # Before running savevm reboot the guest. After the guest boots > (qemu) savevm 1 > Virtqueue size exceeded

The above error happens while flushing the TX queue just before snapshot.

In virtqueue_pop():

if (vq-&gt;inuse &gt;= vq-&gt;vring.num) {
    virtio_error(vdev, &quot;Virtqueue size exceeded&quot;);
    goto done;
}

(gdb) p/x vq->inuse $5 = 0xffffffff

This is clearly wrong.

This seem to be the result of a previous call to virtqueue_push() with vq->inuse == 0, which is also wrong.

At this point, the device is broken (vdev->broken is true) and should be reset to work again. This is recorded in the snapshot.

> (qemu) loadvm 1 > VQ 1 size 0x100 < last_avail_idx 0x0 - used_idx 0x1 > Failed to load virtio-net:virtio > error while loading state for instance 0x0 of device > 'pci@800000020000000:00.0/virtio-net' > Error -1 while loading VM state > ```

Since the device is broken and is supposed to be reset, I'm not sure it is appropriate for loadvm to fail actually.

> > Guest kernel: 4.14.0-3.git68b4afb.el7.centos.ppc64le > HostKernel: 4.14.0-1.rel.git68b4afb.el7.centos.ppc64le > Qemu: qemu-2.11.0-1.rel.gite7153e0.el7.centos.ppc64le > </cde:init_github_desc>

cdeadmin commented 6 years ago

------- Comment From nasastry@in.ibm.com 2018-03-07 23:58:53 EDT------- Tested the patch from qemu-devel with subject: virtio_net: flush uncompleted TX on reset

No errors are seen now.

With out patch: (qemu) system_reset (qemu) savevm 1 Virtqueue size exceeded (qemu) loadvm 1 VQ 1 size 0x100 < last_avail_idx 0x0 - used_idx 0x1 Failed to load virtio-net:virtio error while loading state for instance 0x0 of device 'pci@800000020000000:00.0/virtio-net' Error -1 while loading VM state

With patch: (qemu) system_reset (qemu) savevm 1 (qemu) loadvm 1

Thanks!!

cdeadmin commented 6 years ago

------- Comment From KURZGREG@fr.ibm.com 2018-03-30 03:46:55 EDT------- The fix is now upstream.

https://git.qemu.org/?p=qemu.git;a=commit;h=94b52958b77a2a040564cf7ed716d3a9545d94e5

It will be shipped with QEMU 2.12.