virtio-win / kvm-guest-drivers-windows

Windows paravirtualized drivers for QEMU\KVM
https://www.linux-kvm.org/page/WindowsGuestDrivers
BSD 3-Clause "New" or "Revised" License
2.01k stars 386 forks source link

viostor: cause qemu virtio_error when an uncompleted request times out #907

Open wangyan0507 opened 1 year ago

wangyan0507 commented 1 year ago

Describe the bug I found an virtio_error() in QEMU when handling virtio block request with windows11 guest.

Two uncompleted blk request(req1 and reqN) has the same indirect descriptor table address. The length of indirect descriptor table in req1 is 48, but the lenth in reqN is 160. So, it cause error when handling req1 in QEMU.

The stack trace:

(gdb) bt
#0  virtio_error (vdev=0xaaaaad604420, fmt=0xaaaaabb54ad0 "Desc next is %u") at ../hw/virtio/virtio.c:3568
#1  0x0000aaaaab642868 in virtqueue_split_read_next_desc (vdev=0xaaaaad604420, desc=0xffffefff7d00, desc_cache=0xffffefff7d10, max=3, next=0xffffefff7cb4) at ../hw/virtio/virtio.c:1057
#2  0x0000aaaaab643c8c in virtqueue_split_pop (vq=0xaaaaad60f740, sz=240) at ../hw/virtio/virtio.c:1591
#3  0x0000aaaaab644584 in virtqueue_pop (vq=0xaaaaad60f740, sz=240) at ../hw/virtio/virtio.c:1768
#4  0x0000aaaaab5d8d3c in virtio_blk_get_request (s=0xaaaaad604420, vq=0xaaaaad60f740) at ../hw/block/virtio-blk.c:238
#5  0x0000aaaaab5da1fc in virtio_blk_handle_vq (s=0xaaaaad604420, vq=0xaaaaad60f740) at ../hw/block/virtio-blk.c:772
#6  0x0000aaaaab5da304 in virtio_blk_handle_output (vdev=0xaaaaad604420, vq=0xaaaaad60f740) at ../hw/block/virtio-blk.c:806
#7  0x0000aaaaab645b00 in virtio_queue_notify_vq (vq=0xaaaaad60f740) at ../hw/virtio/virtio.c:2249
#8  0x0000aaaaab648fb4 in virtio_queue_host_notifier_read (n=0xaaaaad60f7b4) at ../hw/virtio/virtio.c:3529
#9  0x0000aaaaab951690 in aio_dispatch_handler (ctx=0xaaaaac82f9a0, node=0xfffdb0002ea0) at ../util/aio-posix.c:369
#10 0x0000aaaaab951794 in aio_dispatch_ready_handlers (ctx=0xaaaaac82f9a0, ready_list=0xffffefffe040) at ../util/aio-posix.c:399
#11 0x0000aaaaab9522d0 in aio_poll (ctx=0xaaaaac82f9a0, blocking=true) at ../util/aio-posix.c:722
#12 0x0000aaaaab7a158c in iothread_run (opaque=0xaaaaac495e30) at ../iothread.c:67
#13 0x0000aaaaab956ae0 in qemu_thread_start (args=0xaaaaac82ff40) at ../util/qemu-thread-posix.c:505
#14 0x0000fffff654ef60 in  () at /usr/lib64/libc.so.6
#15 0x0000fffff65b489c in  () at /usr/lib64/libc.so.6

To Reproduce

I use gdb to attach qemu process, and debug the virtqueue_pop() in virtio block request step by step. Using gdb command 'next/continue/print' in the debug process which may cause the request timeout for windows wdk.

Expected behavior

Not cause virtio_error() in QEMU.

Screenshots If applicable, add screenshots to help explain your problem.

Host:

VM:

Additional context

I use dbgview to capature the debug log, and found the following RESET log before calling function virtqueue_get_buf_split.

00004734    102.45079803     RESET (FFFFA60432D88740) Function 20 Cnt 0 InQueue 1

In the windows driver doc "Handling SRB_FUNCTION_RESET_DEVICE", it said "the port driver requests a device reset when an uncompleted request times out". https://learn.microsoft.com/en-us/windows-hardware/drivers/storage/handling-srb-function-reset-device

vrozenfe commented 1 year ago

@wangyan0507

Thank for reporting this issue. By any chance, do you know if the problem is reproducible on x64 platforms? What is the Widows 11 origin and the build version? Will you be able to share the Windows system log or at least the disk and file system related events ?

Thanks, Vaim.

wangyan0507 commented 1 year ago

@vrozenfe

I'm sorry the windows system log and disk/file system related events can not upload. But I can try on x64 platform. The guest edition about this bug is Windows 11 Enterprise, version is 21H2.

Thanks, Yan Wang.

wangyan0507 commented 1 year ago

@asadpanda

Is the operation "Complete SRB1" right when the req1 is being processed? image

Thanks, Yan Wang.

vrozenfe commented 1 year ago

Technically the following scenario is possible :

If it is the case try to increase IoTimeOut value to see if it makes any difference https://learn.microsoft.com/en-us/windows-hardware/drivers/storage/registry-entries-for-storport-miniport-drivers

wangyan0507 commented 1 year ago

@vrozenfe

The origin test with Windows11 guest has not been set TimeoutValue and IoTimeOutValue. New test result:

  1. Not set TimeoutValue and IoTimeOutValue -> virtio_error
  2. Set TimeoutValue(60s) but not set IoTimeOutValue -> virtio_error
  3. Set TimeoutValue(60s) and IoTimeOutValue(60s) -> it's hard to cause virtio_error

So, TimeoutValue is just worked for virtio-scsi, but not for virito-blk? IoTimeOutValue is for virtio-blk?

And, even though we set IoTimeOutValue, technically it is still can cause the virtio_error problem. What should we do to solve the potential problem thoroughly? e.g. Not complete the SRB1, just return an error.

Thanks, Yan Wang.

vrozenfe commented 1 year ago

If I'm not mistaken TimeOutValue is for SCSI Port Miniport drivers (not related to viostor and vioscsi drivers which both are Stor Port Miniport drivers). So we need to use IoTimeOutValue for both viostor and vioscsi.

We need to understand that 60 Seconds is a huge time period, and we can expect that a request will be completed a way sooner. Even if it is a VM working with a network storage. We can also expect that wrong with vCPU threads scheduling or with the storage back end.

Regarding the solution, I think that the most appropriated way to make it work properly will be to implement the proper virtio queue and back end reset logic, In this case the miniport driver needs not only complete all outstanding SRB but also request QEMU to reset LUN/HBA on the RESET request. We have discussed this possibility internally, but I'm not sure if anything has been done toward this direction recently.

Best, Vadim,

wangyan0507 commented 11 months ago

@vrozenfe

Do you have a lplan to solve the potential problem thoroughly in the future?

Thanks, Yan Wang.

vrozenfe commented 11 months ago

@wangyan0507

Yes, even though the Lun reset logic implemented in #683 and #684 helps to deal with lost/uncompleted requests issues, the proper virtio-queue reset logic needs to implemented in the Windows drivers sooner or later. Honestly, I don't have any specific time line for implementing this feature at the moment, but it is definitely in our future plans.

Best regards, Vadim.

wangyan0507 commented 11 months ago

@vrozenfe

Thanks, expect implementing this feature soon.

I have another question, the "IoTimeOutValue" for our viostor, the registry entry path is "HKLM\System\CurrentControlSet\Services\viostor\Parameters"?

Thanks, Yan Wang.

vrozenfe commented 11 months ago

Yes, [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Parameters\viostor] is the right path for "IoTimeOutValue" parameter.

Best, Vadim.

wangyan0507 commented 11 months ago

@vrozenfe

I set 200s to IoTimeOutValue in the windows guest. And add sleep 60s before pop avail IO from vring in Qemu. After 60s, it will pop all the avail IO in vring. The sleep operation executed about 6 times, after that it still cause qemu virtio_error. The reaon is same as this issue. So, I want to know if 200s is the accurate io timeout value for windows. In this test, it is not reach 200s.

Thanks, Yan Wang.

benyamin-codez commented 1 month ago

@vrozenfe and @wangyan0507,

I have backported the fix for vioscsi into viostor. I think the removal of unnecessary VG unlocks will fix this issue.

Also appears to be a slight performance increase. I was able to get 4.4GB/s on boot+system disk and 6.1-6.3GB/s on additional disks.

Perhaps you could test for us Yan Wang? You will need to build against master and enable TESTSIGNING.

--- ./a/viostor/virtio_stor_hw_helper.c 2024-08-06 05:42:24.000000000 +1000
+++ ./b/viostor/virtio_stor_hw_helper.c 2024-08-15 21:33:31.283522952 +1000
@@ -160,6 +160,7 @@

     ULONG               QueueNumber = 0;
     ULONG               MessageId = 1;
+    int                 res = 0;
     ULONG               OldIrql = 0;
     BOOLEAN             result = FALSE;
     bool                notify = FALSE;
@@ -194,30 +195,29 @@
     vq = adaptExt->vq[QueueNumber];
     RhelDbgPrint(TRACE_LEVEL_VERBOSE, " QueueNumber 0x%x vq = %p\n", QueueNumber, vq);

-    element = &adaptExt->processing_srbs[QueueNumber];
     VioStorVQLock(DeviceExtension, MessageId, &LockHandle, FALSE);
-    if (virtqueue_add_buf(vq,
-                     &srbExt->sg[0],
-                     srbExt->out, srbExt->in,
-                     &srbExt->vbr, va, pa) >= 0) {
+    res = virtqueue_add_buf(vq,
+        &srbExt->sg[0],
+        srbExt->out, srbExt->in,
+        &srbExt->vbr, va, pa);
+
+    if (res >= 0) {
         notify = virtqueue_kick_prepare(vq);
+        element = &adaptExt->processing_srbs[QueueNumber];
         InsertTailList(&element->srb_list, &srbExt->vbr.list_entry);
         element->srb_cnt++;
-        VioStorVQUnlock(DeviceExtension, MessageId, &LockHandle, FALSE);
 #ifdef DBG
         InterlockedIncrement((LONG volatile*)&adaptExt->inqueue_cnt);
 #endif
         result = TRUE;
-    }
-    else {
-        VioStorVQUnlock(DeviceExtension, MessageId, &LockHandle, FALSE);
+    } else {
         RhelDbgPrint(TRACE_LEVEL_ERROR, " Can not add packet to queue %d.\n", QueueNumber);
         StorPortBusy(DeviceExtension, 2);
     }
-    if (notify) {
+    VioStorVQUnlock(DeviceExtension, MessageId, &LockHandle, FALSE);
+    if (notify){
         virtqueue_notify(vq);
     }
-
     if (adaptExt->num_queues > 1) {
         if (CHECKFLAG(adaptExt->perfFlags, STOR_PERF_OPTIMIZE_FOR_COMPLETION_DURING_STARTIO)) {
            VioStorCompleteRequest(DeviceExtension, MessageId, FALSE);
Atchess commented 2 weeks ago

@vrozenfe If we don't have plans to implement SRB_FUNCTION_RESET_LOGICAL_UNIT in the short term, can we return SRB_STATUS_INVALID_REQUEST to indicate that we don't support the SRB, instead of completing all pending requests? This way, it would prevent abnormal IO errors and only cause Windows to keep waiting for the IO.

vrozenfe commented 1 week ago

@Atchess

Do you want to add an extra case to ACTION_ON_RESET https://github.com/virtio-win/kvm-guest-drivers-windows/blob/master/vioscsi/vioscsi.h#L281C3-L281C17 and handle it respectively?

Thanks, Vadim.

Atchess commented 1 week ago

@vrozenfe No, I only want to remove CompletePendingRequests from viostor and will not modify the code in vioscsi.

--- a/viostor/virtio_stor.c
+++ b/viostor/virtio_stor.c
@@ -958,8 +958,8 @@ VirtIoStartIo(
         case SRB_FUNCTION_RESET_BUS:
         case SRB_FUNCTION_RESET_DEVICE:
         case SRB_FUNCTION_RESET_LOGICAL_UNIT: {
-            CompletePendingRequests(DeviceExtension);
-            CompleteRequestWithStatus(DeviceExtension, (PSRB_TYPE)Srb, SRB_STATUS_SUCCESS);
+            CompleteRequestWithStatus(DeviceExtension, (PSRB_TYPE)Srb, SRB_STATUS_INVALID_REQUEST);
 #ifdef DBG
             RhelDbgPrint(TRACE_LEVEL_INFORMATION, " RESET (%p) Function %x Cnt %d InQueue %d\n",
                         Srb, SRB_FUNCTION(Srb), adaptExt->srb_cnt, adaptExt->inqueue_cnt);
vrozenfe commented 1 week ago

@Atchess

Ah, sorry. Don't know why, but thought that you are asking about virtio-scsi driver. Who is going to complete the pending request then? Does your HBA has some sort of logic to deal with uncompleted/abandoned requests?

Best, Vadim.

Atchess commented 1 week ago

@vrozenfe

We will not complete the pending request, so windows will not recycle it. If the driver completes the pending request without stopping the device or resetting the virtio queues, windows will recycle the memory associated with that request. However, the device may still access this memory because the request remains in the virtio queues, potentially leading to file system corruption. If the driver does not complete the pending request, windows will continue to wait for its completion, resulting in an io hang, similar to an io hang in Linux.

Best, Qi.

vrozenfe commented 1 week ago

@Atchess

Yes, the class driver will not reuse an abandoned SRB and IIRC will be able to allocate a new block if needed (balancing between minimal and maximal working set of packages). That will probably work fine in case if we don't have a massive packages loss condition which can happen on a network storage (at least such scenario can be easily emulated). Apart from that, if I am not mistaken a storport miniport driver is required to complete any uncompleted requests in case of receiving SRB_FUNCTION_RESET_LOGICAL_UNIT or SRB_FUNCTION_RESET_DEVICE requests.

In any case if you really need to complete SRB_FUNCTIONRESET* requests as SRB_STATUS_INVALID_REQUEST and you are going to submit your patch I would suggest to introduce a new Registry key to be able to change the current behaviour and fulfil your needs (just as it is done in vioscsi) but keeping the default execution flow as is.

But a long term target, the proper implementation of multitier reset provided by storport definitely will be the best solution.

Best, Vadim.