vpelletier / python-libusb1

Python ctype-based wrapper around libusb1
GNU Lesser General Public License v2.1
168 stars 65 forks source link

RuntimeError: Set changed size during iteration #51

Closed whitequark closed 2 years ago

whitequark commented 4 years ago

I can reliably reproduce this crash in my project when I'm calling usb_handle.close():

  File "/home/whitequark/Projects/Glasgow/software/glasgow/device/hardware.py", line 142, in close
    self.usb_handle.close()
  File "/home/whitequark/Projects/python-libusb1/build/lib/usb1/__init__.py", line 1234, in close
    for transfer in inflight:
RuntimeError: Set changed size during iteration

The following patch fixes the error for me:

--- a/python-libusb1/usb1/__init__.py Thu Oct  3 19:16:01 2019
+++ b/python-libusb1/usb1/__init__.py Thu Oct  3 19:16:04 2019
@@ -1231,7 +1231,7 @@
             transfer_set.add(transfer)
             transfer.doom()
         inflight = self.__inflight
-        for transfer in inflight:
+        for transfer in self.__set(inflight):
             try:
                 transfer.cancel()
             except (self.__USBErrorNotFound, self.__USBErrorNoDevice):

However I'm not sure if this is the correct way to do it.

vpelletier commented 4 years ago

This error means that either some transfers were submitted while insideclose, or that they completed.

Submission should not normally happen: partly because transfers were "doomed" just before during device handle closure, making them raise on submission (but that is not bullet-proof to avoid using locks), and more importantly because application should really not try to submit transfers on a device being closed.

So it seems most likely that what happened here is a transfer completion. And that would make your proposal fine, as then cancel will have nothing to do for that transfer (and likely just raise USBErrorNotFound).

Could you confirm this ? I think this requires adding a print in USBTransfer.__callbackWrapper which would emit self.__handle, and another in USBDeviceHandle.close, also emitting self.__handle for comparison.

whitequark commented 4 years ago

I get this:

__callbackWrapper <usb1.libusb1.LP_libusb_device_handle object at 0x7fd5824e3a60> 0
I: g.device.hardware: device already has bitstream ID 2372957e4b8b61357737fbc2c14d0a9a
I: g.cli: running handler for applet 'audio-dac'
__callbackWrapper <usb1.libusb1.LP_libusb_device_handle object at 0x7fd5824e3a60> 0
__callbackWrapper <usb1.libusb1.LP_libusb_device_handle object at 0x7fd5824e3a60> 0
I: g.applet.audio.dac: port(s) A, B voltage set to 5.0 V
__callbackWrapper <usb1.libusb1.LP_libusb_device_handle object at 0x7fd5824e3a60> 0
__callbackWrapper <usb1.libusb1.LP_libusb_device_handle object at 0x7fd5824e3a60> 0
^Cclose <usb1.libusb1.LP_libusb_device_handle object at 0x7fd5824e3a60>
__callbackWrapper <usb1.libusb1.LP_libusb_device_handle object at 0x7fd5824e3a60> 3
Traceback (most recent call last):
  File "/home/whitequark/.local/bin/glasgow", line 11, in <module>
__callbackWrapper <usb1.libusb1.LP_libusb_device_handle object at 0x7fd5824e3a60> 3
    load_entry_point('glasgow', 'console_scripts', 'glasgow')()
__callbackWrapper <usb1.libusb1.LP_libusb_device_handle object at 0x7fd5824e3a60> 3
  File "/home/whitequark/Projects/glasgow/software/glasgow/cli.py", line 795, in main
    exit(loop.run_until_complete(_main()))
  File "/usr/lib/python3.7/asyncio/base_events.py", line 584, in run_until_complete
    return future.result()
  File "/home/whitequark/Projects/glasgow/software/glasgow/cli.py", line 787, in _main
    device.close()
  File "/home/whitequark/Projects/glasgow/software/glasgow/device/hardware.py", line 142, in close
    self.usb_handle.close()
  File "/home/whitequark/Projects/python-libusb1/build/lib/usb1/__init__.py", line 1236, in close
    for transfer in inflight:
RuntimeError: Set changed size during iteration

with this patch:

--- /home/whitequark/Projects/python-libusb1/build/lib/usb1/__init__.py Mon Oct  7 13:39:14 2019
+++ /home/whitequark/Projects/python-libusb1/build/lib/usb1/__init__.py Mon Oct  7 13:39:16 2019
@@ -400,6 +400,7 @@
         fired (ie, mark transfer as not submitted upon call).
         """
         self = cls.__submitted_dict.pop(addressof(transfer_p.contents))
+        print("__callbackWrapper", self.__handle, self.getStatus())
         self.__after_completion(self)
         callback = self.__callback
         if callback is not None:
@@ -1212,6 +1213,7 @@
         do not close nor let GC collect a USBDeviceHandle which has in-flight
         transfers.
         """
+        print("close", self.__handle)
         handle = self.__handle
         if handle is None:
             return

Note to self: requires glasgowembedded/glasgow@f45cd5f to reproduce.

vpelletier commented 4 years ago

Thanks.

3 is LIBUSB_TRANSFER_CANCELLED, so it confirms my understanding. And seeing this trace, I guess the transfer completing here is actually one which was just cancelled during close's loop - so basically it trips on its own feet, via the threaded event handler.

Note to self: requires GlasgowEmbedded/glasgow@f45cd5f to reproduce.

I had a look at how this project uses python-libusb1. I'm reading on asyncio in python3 to try to give some relevant advice, as so far the only thing I can say is that it would be better to join event handling thread before closing the devices - but that does not say how handleEvents gets awoken. And I do not want to push you down the libusb1 threaded event handling rabbit hole unless I get very sure it leads to a solution.

vpelletier commented 4 years ago

Before going into asyncio considerations, I think I've come up with an even simpler close implementation:

diff --git a/usb1/__init__.py b/usb1/__init__.py
index ef79038..60f40cc 100644
--- a/usb1/__init__.py
+++ b/usb1/__init__.py
@@ -1221,13 +1221,14 @@ class USBDeviceHandle(object):
                 break
             transfer_set.add(transfer)
             transfer.doom()
-        inflight = self.__inflight
-        for transfer in inflight:
+            # It is tempting to only cancel in-flight transfers. But if there
+            # is an event handling thread alive, it will modify that set and
+            # cause iteration to fail. So just attempt to cancel all transfers.
             try:
                 transfer.cancel()
             except (self.__USBErrorNotFound, self.__USBErrorNoDevice):
                 pass
-        while inflight:
+        while self.__inflight:
             try:
                 self.__context.handleEvents()
             except self.__USBErrorInterrupted:

Then, the next issue we will have is either or our handleEvents calls randomly waiting "forever" when the test-then-acquire race-condition goes wrong. I could not yet figure out a safe way to handle this while keeping close complexity complexity at a sane level.

In my asyncio reads I now get an understanding of what it provides. It looks very tempting to integrate libusb1 file descriptors with the loop. But then I do not yet see a way of implementing the thread-safe event handler described in the page I linked above. Which is not a blocker in itself, but it means that if your code is imported into a larger program which would try to use libusb1 blocking or event-handling APIs then you will get bug reports about weird occasional long delays.

whitequark commented 4 years ago

It looks very tempting to integrate libusb1 file descriptors with the loop.

Actually, the reason I am not doing it is because I need WinUSB, which does not let me use file descriptors.

vpelletier commented 4 years ago

Good point.

About waking handleEvents, there is one idea I'm not too happy about but which would allow joining that thread during close: track transfers at your level (just which one exists, no need to track their submission state), and right after setting done = True you cancel them all (catching USBErrorNotFound and USBErrorNoDevice). One cancellation will complete, waking up handleEvents, let the thread notice it should terminate, and once joined you can close the device. The other cancellations will complete during close.

What makes me unhappy about this solution is that this makes you duplicate what USBDeviceHandle.close already does. But as I still cannot come up with a more satisfying answer, I thought I should just suggest it in the meantime. Also, if nothing is submitted (which I suspect may happen if the device is unplugged while the application is still running: transfer cancellation will come from the kernel maybe before your code can stop the event handling thread), the thread will not wake up.

Another idea would be to provide a timeout to handleEvents at the cost of a bit of CPU time waste from unnecessary thread wake-ups, and joining thread will wait for that timeout. Which could be a fallback for the case where no transfer is submitted.

Tangential note: there is a race-condition between getTransfer and close: if a multithreaded program keeps requesting new transfers while trying to close the handle, some transfers will not be cleaned up. The cost of closing that race is adding a lock in both, which I originally saw as a useless penalty to well-behaved applications. OTOH, getTransfer should in reality be rarely used: transfers should ideally be kept around and resubmitted rather than thrown away & replaced with new ones. Which should make a getTransfer overhead negligible in face of all the existing device setup costs (enumerate, open, set configuration, claim interface, ...). So I am tempted to go in that direction, and that would be another reason for tracking transfers at your level.

whitequark commented 4 years ago

I think I've permuted my libusb-using code so that it does what you suggest. (Turns out I already wrote the transfer-tracking code for other reasons.) So this should be no longer an issue for me. Feel free to handle in any way you consider appropriate.

vpelletier commented 2 years ago

This issue should be fixed by efcf37080182b674caee1d31afa9bd8c9ec06cd9 .

whitequark commented 2 years ago

Thank you! I've been hitting that error on Windows, so this will help a lot.