mcuee / libusb-win32

libusb-win32 project official github repo
172 stars 45 forks source link

BSOD when transferring more than 64MB in one bulk transfer #51

Closed dontech closed 7 months ago

dontech commented 8 months ago

Found during the investigation of: https://github.com/mcuee/libusb-win32/issues/45

@sonatique mentioned this:

@dontech : I just test with the simple loopback test I provided on December 7 and I can no longer see any "data corruption", that is great, thanks a lot! I tested with my desired buffer size of 2MBytes and it was fine. EDIT: I also tested with my real application (3 buffers of 2MBytes each) and I no longer see corruption!

However: I then tried to increase buffer size even more, just for fun (since you claimed we could go up to 4GBytes ;-) ) 4, 8 , 16, 32, MBytes was OK. But, then at 64MB or more, I immediately get a BSOD, every time. Sometimes Stop code is BAD_POOL_HEADER, sometimes KERNEL_MODE_HEAP_CORRUPTION.

I have no idea where this comes from. Did you try on your side with 64 MBytes or more?

sonatique commented 8 months ago

@dontech Thanks a lot for opening this. Note that I didn't test in between 32MB (last that work) and 64MB (first that failed) (however I tested 256MB, fails as well). Let me know if I can do something.

dontech commented 8 months ago

I found out that the crash appears to come from the core USBD drivers.

It would seem that they do not like if the user tries to submit requests that are much bigger than the maximum transfer size that they advertise. Probably a bug in USBD.

We can easily mitigate this by never requesting more than maximum transfer size allowed. USBD never transfers more than this anyway, so there is no performance impact.

sonatique commented 8 months ago

Thanks a lot @dontech ! Is there a way to retrieve this advertised maximum transfer size, where is it advertised? Or is it defined somewhere in a header or something? Thanks.

dontech commented 8 months ago

Thanks a lot @dontech ! Is there a way to retrieve this advertised maximum transfer size, where is it advertised? Or is it defined somewhere in a header or something? Thanks.

This originates inside the driver from USBD_INTERFACE_INFORMATION, which is something USBD returns for each endpoint.

dontech commented 8 months ago

This should be fixed on this branch: https://github.com/mcuee/libusb-win32/tree/libusb-max-transfer

1) The transfer limitation inside the DLL, as it was likely a work-around for the same problem you have discovered. It did as you noticed not cover all cases though.

2) Driver now chunks all transfers in max_transfer_size mandated by USBD, for both INT/BULK reads and writes.

dontech commented 8 months ago

@sonatique There is a new test release here: https://github.com/mcuee/libusb-win32/releases/tag/test_release_max_transfer

Please test if this fixes your problem. I have tested all the relevant cases here, and to me this seems fixed.

Might want to raise a bug with Microsoft, as I am fairly sure the driver is not supposed to crash when the client requests too much data. But for libusb0, this fixes the problem.

As usual, please disable driver signing enforcement or allow test signing.

mcuee commented 8 months ago

Just wonder if this is related to the transfer size limited imposed by Microsoft. https://learn.microsoft.com/en-us/windows-hardware/drivers/usbcon/usb-bandwidth-allocation

Windows 8/8.1 limit, Microsoft inbox USB driver does not support xHCI for Windows 7 so it is up to 3rd party driver

Control transfer: 64K for SuperSpeed and high speed (xHCI) 4K for full and low speed (xHCI, EHCI, UHCI, OHCI) For UHCI, 4K on the default endpoint; 64K on non-default control pipes

Interrupt transfer 4MB for SuperSpeed, high, full, and low speed (xHCI, EHCI, UHCI, OHCI)

Bulk transfer Bulk 32MB for SuperSpeed (xHCI) 4MB for high and full speed (xHCI) 4MB for high and full speed (EHCI and UHCI) 256K full speed (OHCI)

Isochronous transfer 1024 * wBytesPerInterval for SuperSpeed (xHCI) (see USB_SUPERSPEED_ENDPOINT_COMPANION_DESCRIPTOR)

1024 MaximumPacketSize for high speed (xHCI, EHCI) 256 MaximumPacketSize for full speed (xHCI, EHCI) 64K for full speed (UHCI, OHCI)

mcuee commented 8 months ago

Shall we change the following to the limit as mentioned here? https://learn.microsoft.com/en-us/windows-hardware/drivers/usbcon/usb-bandwidth-allocation

/* 64k */
#define LIBUSB_MAX_READ_WRITE 0x10000

BTW, I do not see that we limit the control transfer t0 4kB.

mcuee commented 8 months ago

BTW, I do not see that we limit the control transfer t0 4kB.

Reference:

dontech commented 8 months ago

@mcuee you are right. According to docs: "The MaximumTransferSize member of the USBD_PIPE_INFORMATION structure is obsolete."

We will need to address this, but lets do this in a separate bug.

https://github.com/mcuee/libusb-win32/issues/52

I still think the current patch is correct, but the use of pipe information is of course not 100% correct. However, I think currently this has no impact, as USBD to today just answers 64K for all enpoint types, and this seems to work.

@sonatique Can you please verify that the current patch fixes your problem?

sonatique commented 8 months ago

@dontech : of course I will verify the latest patch ASAP. Unfortunately, I can not do it before January 22, sorry for the delay.

dontech commented 7 months ago

https://community.osr.com/discussion/294230/usbd-bsod-when-requesting-too-big-urb-function-bulk-or-interrupt-transfer/p1?new=1

sonatique commented 7 months ago

@dontech : I have been able to test your latest build. I have one good news and one bad news:

Good news: I tested with my "loopback test" program for transfer size ranging from 16KBytes to 1GBytes: no problem, no corruptions, no BSOD, perfect! Thank you very much for this!

Bad news: I then tested with my full stack real-world application. As opposed to my simple "loopback test", this application uses more than one concurrent transfer. By default it uses 16 transfers, each of 16KBytes. As you remember: with the last official libusb0.sys release I had no problem with this configuration, but I started to have trouble if the size of transfers was above 64KBytes. That is what triggered https://github.com/mcuee/libusb-win32/issues/45 . I then wrote my loopback test program to reproduce the issue, but using one single transfer at a time, since it was enough to show the >64KB issue. But now, what I observe is that: using the libusb0.sys you made for this issue, or the one you made for issue #45, I get data corruption as soon as I use more than one concurrent transfer.

To summarize: Last official release: OK for 1 to N concurrent transfers as long as each transfer size is <=64KByte. NOT OK for transfer size >64KByte

Version of #45 : OK for transfer size < 32Mbytes as long as only one single transfer at a time. NOT OK (BSOD) for transfer size > 32MBytes NOT OK if more than one concurrent transfers : "data corruption" (might be missing data, I am not sure yet).

Version of this issue #51 : OK for transfer size of any size as long as only one single transfer at a time, no more BSOD NOT OK if more than one concurrent transfer:s "data corruption" (might be missing data, I am not sure yet).

This is 100% reproducible. I am sorry I didn't correctly test for more then one concurrent transfers when you made changes in #45 .

My conclusion is that you broke something when you made the changes in #45 . Therefore I would advise you not to officially publish the current state of the code.

I will modify my "loopback test" program to use more than one concurrent transfer and push it ASAP, such that you can experience this by yourself.

Thanks a lot.

sonatique commented 7 months ago

@dontech Well, I quickly modified my loopback test program for handling multiple transfers at once, but it doesn't show the issue...

The difference in between this simple version and my full application is that in loopback test each transfer has a fixed constant buffer pointer and buffer contents are verified sequentially, given transfer callback all happens within the same "handling thread". Given that verifying buffers is a "long" process, I am suspecting that all transfers are done while I am verifying the first one, I then resubmit it, verify the second, etc. but de facto there might be only one active transfer at a time.

In my full application what I do is that, in callback function, I put the transfer buffer pointer into a list then quickly allocate a new buffer, set the transfer buffer pointer with it and submit the transfer again.

I will try to improve loopback test for simulating this, but it might not be easy.

In the meantime: do you have any idea how your changes could create the behavior I am observing? Thanks.

dontech commented 7 months ago

Hi @sonatique

OK, I think i know what is going on. The problem here is that you rack up 10 transfers which are all directly submitted to the underlying driver (USBD).

Since the last patch, each transfer is broken into several USBD requests (like usbk). When you submit multiple requests at the same time, there is no guarantee that the reads will be done in order.

So the first transfer might do:

1: packet 1, packet 2, packet 6

And second will do:

2: packet 3, packet 4, packet 5

Are you sure you are missing data, and that they are not just out of order?

Thanks,

/pedro

sonatique commented 7 months ago

@dontech : thanks for the explanation. No, I am not sure at all that I am missing data vs disordered. I didn't investigate more as I wanted to reproduce with the simpler "loopback test", but to no avail so far.

I guess your explanation is absolutely correct, but then, this is a big problem, at least for met (and for other people using more than one concurrent transfers) ... :-(

I tested with libusbK.sys and I have no issues: I can have multiple concurrent transfers, each of 2MBytes or so...

Thanks.

Do you think that there is any chance that you could fix this while retaining the possibility of having transfer >64K?

dontech commented 7 months ago

Just verified it. This is exactly what is happening.

libusb0-sys:[transfer_complete] sequence 7: 64 bytes transmitted, maximum_packet_size=64, totalLength=128 libusb0-sys:[transfer_complete] sequence 8: 64 bytes transmitted, maximum_packet_size=64, totalLength=64 libusb0-sys:[transfer_complete] sequence 7: 0 bytes transmitted, maximum_packet_size=64, totalLength=64 libusb0-sys:[transfer_complete] sequence 7: 64 bytes transmitted, maximum_packet_size=64, totalLength=64 libusb0-sys:[transfer_complete] sequence 9: 8000 bytes transmitted, maximum_packet_size=64, totalLength=8000

We cannot fix this without serializing what goes into the driver. We could put a lock on it or something.

dontech commented 7 months ago

The basic problem is that you expect the data to arrive in the same order as your read requests.

dontech commented 7 months ago

This must mean that libusbk has a lock on the transfers, which only unlocks when the transfer is complete.

sonatique commented 7 months ago

The basic problem is that you expect the data to arrive in the same order as your read requests.

Yes, indeed. There have been several discussion on libusb-1.0 github regarding whether this is guaranteed to happen in all cases, and the conclusion was: yes.

And in fact it is the case with latest official libusb-win32 release.

Looks like I'll have to choose in between more than one transfer or transfer size >64K... Damned.

dontech commented 7 months ago

Looks like I'll have to choose in between more than one transfer or transfer size >64K... Damned.

1) Its not related to 64K directly. It is related to the endpoint size. In my case I reproduced it with an endpoint size of just 64 bytes.

2) Why do you need all those queued requests? Do they actually help performance?

sonatique commented 7 months ago

@dontech

  1. What I meant is that I have to choose to either stay with current official release and have more than one buffer but of no more than 64KByte in size, or to use the latest patch to use an arbitrary large buffer size but only one at at time...

  2. Well, I technically do not need 16 buffers, it's an old number put here to be sure it was large enough. My recent tests with a USB3 device producing sustained 180MBytes/s indicate that my sweet spot is 3 buffers of 2MBytes each. With less buffers I cannot cope with 180MBytes/s, smaller buffers increase CPU consumption. That is why I really hope to have libusb0.sys supporting my ideal configuration.

dontech commented 7 months ago

Yeah, this s not related to my latest patch, but I fully understand the problem. I am going to carry this to a new bug.