virtio-win / kvm-guest-drivers-windows

Windows paravirtualized drivers for QEMU\KVM
https://www.linux-kvm.org/page/WindowsGuestDrivers
BSD 3-Clause "New" or "Revised" License
2.07k stars 386 forks source link

windows hang in poweroff using virtio-blk when san storage is offline #135

Closed zyb521 closed 7 years ago

zyb521 commented 7 years ago

hi everyone I meet a problem by using virtio-blk drivers. A windows vm poweroff take very long time (more then 30min), and the hang problem only found in VMs using virtio-blk, using virtio-scsi can poweroff normal.

I found some error: 1.vm's qemu log show BLOCK_IO_ERROR 2.the san storage is disconnected at this time

I think this problem is caused by disk IO timeout In vioscsci, driver implement the HwResetBus callback fun, but not implement in viostor, and just "return TRUE"

why these two disk drivers have the different HwResetBus callback implements?

I want to implement the viostor's HwResetBus callback, Do you have any suggestions?

Thank you all.

some info: windows 2012 r2 with 2 disk using virtio-blk virtio-win 0.1.130 qemu-kvm 2.6

vrozenfe commented 7 years ago

Thank you for reporting this issue. Could you please provide the exact qemu command line?

Answering your question regarding differences between blk and scsi drivers their HwResetBus routine. Well, they are different drivers and scsi, unlike blk, has to pass reset request to qemu. Do you ever see HwResetBus routine called? What is your storage configuration? Do I understand correctly that it is SAN located? If so, can you please confirm that the problem is reproducible with local storage? Could you please give a try to more recent virtio-win drivers and qemu?

Thanks, Vadim.

zyb521 commented 7 years ago

@vrozenfe Yes, you are right. It's SAN located.

Before shutdown the VM, I copy a big size file(2GB) form a disk to another, during hang I don't see the HwResetBus routine called from viostor driver's debug output. But always print “SRB_STATUS_ERROR”.

It seem that: 1.Copy big file cause a lot of IO Request to viostor 2.SAN disconnect cause qemu(virtio-blk) report BLOCK_IO_ERROR to viostor driver 3.viostor report SRB_STATUS_ERROR to up-layer(storport) driver, But still have IOs continue to be issued to the viostor driver, Even if I press the shutdown button. So this is a loop, and VM is hang in shutdown.

It is difficult to understand why in the shutdown phase, there will still be a lot of IO to the viostor driver, and Does the driver need to avoid the failures caused by a large number of error IO?

Thanks.

viostor com debug:

<--->VirtIoMSInterruptRoutine : MessageID 0x1 SRB_STATUS_ERROR <--->VirtIoMSInterruptRoutine : MessageID 0x1 SRB_STATUS_ERROR <--->VirtIoMSInterruptRoutine : MessageID 0x1 SRB_STATUS_ERROR

vm qemu log:

2017-05-15T15:53:58.630721+08:00|info|qemu[9675]|[9675]|monitor_qapi_event_emit[479]|: {"timestamp": {"seconds": 1494834837, "microseconds": 238206}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "reason": "Input/output error", "operation": "write", "action": "report"}}

qemu command:

/usr/bin/qemu-kvm -name raw_example,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-97-raw_example/master-key.aes -machine pc-i440fx-2.6,accel=kvm,usb=off -cpu qemu64,hv_relaxed -m 2143 -realtime mlock=off -smp 4,sockets=1,cores=4,threads=1 -uuid a85ac3c0-93c9-40ef-b0a9-7cdeab2befd2 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-97-raw_example/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,clock=vm,driftfix=slew -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x5 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/Images/TestImg/zyb/disk/pvc_win2012_r2_standard_64.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=none,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/Images/TestImg/zyb/disk/test.img,format=raw,if=none,id=drive-virtio-disk1,cache=none,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1 -drive if=none,id=drive-ide0-0-1,readonly=on,cache=none,aio=native -device ide-cd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1 -netdev tap,fd=37,id=hostnet0,vhost=on,vhostfd=39 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:13:2c:e1,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/run/libvirt/qemu/raw_example.extend,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.1 -chardev socket,id=charchannel1,path=/var/run/libvirt/qemu/raw_example.agent,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev socket,id=charchannel2,path=/var/run/libvirt/qemu/raw_example.hostd,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=org.qemu.guest_agent.2 -chardev socket,id=charchannel3,path=/var/run/libvirt/qemu/raw_example.upgraded,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=4,chardev=charchannel3,id=channel3,name=org.qemu.guest_agent.3 -device usb-tablet,id=input0 -vnc 0.0.0.0:3 -device cirrus-vga,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -device pvpanic -msg timestamp=on

vrozenfe commented 7 years ago

I see. SRB_STATUS_ERROR is a "normal" situation, it doesn't mean that the entire storage stack got collapsed, and port has to stop sending requests. There is no logic in viostor driver that can trigger and handle properly the event that the media itself has been disconnected. Actually viostor doesn't expect the media to be removable. Another problem is that some requests can "stuck" in virtio queue without any chance to be completed later on. Proper handling of media removal event is a kind of feature that definitely can be added to viostor at some point of time. You can fill up a new bug report at http://bugzilla.redhat.com to bring some more attention to this problem.

Best, Vadim.