vpelletier / python-libusb1

Python ctype-based wrapper around libusb1
GNU Lesser General Public License v2.1
168 stars 65 forks source link

Occasional segfault if `USBTransfer.close` is called in callback #47

Closed forrestv closed 5 years ago

forrestv commented 5 years ago

During the entirety of the time that libusb is calling the callback that python-libusb1 provides, the thunk (stored in USBTransfer.__ctypesCallbackWrapper) needs to be kept alive. Calling USBTransfer.close within the callback destroys the (still-running) thunk, causing occasional segfaults in libffi when returning back to libusb.

Currently, the __submitted flag and before_submit/after_completion callbacks are used to keep the USBTransfer alive, and that's almost sufficient. The __ctypesCallbackWrapper is special, though; it needs to be guaranteed to live a little bit longer - until after __callbackWrapper has returned.

I think that the easiest fix here is to:

  1. Make __callbackWrapper a @staticmethod
  2. Create __ctypesCallbackWrapper in the class body rather than in __init__ so it's shared between all USBTransfers
  3. Use a dict of currently submitted transfers to get the USBTransfer object corresponding to the libusb transfer pointer that is provided to the callback
vpelletier commented 5 years ago

Thanks for this excellent report. The investigation must have been painful, I am very sorry about this.

I pushed a tentative fix to the new (and temporary) keep_callback_alive branch. Would you mind giving it a try, and reporting whether it fixes this issue ?

This patch is not as clean as I would like it to be, I may be missing something.

forrestv commented 5 years ago

(moved below comment from commit afcc79f1f3 to this issue)

When I ran it with the branch at first, I got this error:

Traceback (most recent call last):
  File "_ctypes/callbacks.c", line 315, in 'calling callback function'
  File "/home/forrest/sonar/software/usb1/__init__.py", line 408, in __callbackWrapper
    self.close()
  File "/home/forrest/sonar/software/usb1/__init__.py", line 353, in close
    if self.isSubmitted():
  File "/home/forrest/sonar/software/usb1/__init__.py", line 799, in isSubmitted
    return addressof(self.__transfer.contents) in self.__submitted_dict
AttributeError: 'NoneType' object has no attribute 'contents'

That was easily fixed with this diff:

diff --git a/usb1/__init__.py b/usb1/__init__.py
index aec6627..1133403 100644
--- a/usb1/__init__.py
+++ b/usb1/__init__.py
@@ -796,7 +796,7 @@ class USBTransfer(object):
         Tells if this transfer is submitted and still pending.
         """
         # Relies on None never being in __submitted_dict.
-        return addressof(self.__transfer.contents) in self.__submitted_dict
+        return self.__transfer is not None and addressof(self.__transfer.contents) in self.__submitted_dict

     def submit(self):
         """

With that change, it runs. The segfault previously took an average of around 12 hours to happen; I'll run it for a few days and see if it happens.

From reading the diff, I agree that it should fix the problem.

forrestv commented 5 years ago

My code that previously segfaulted after about 12 hours on average has been running for a few days without problems. I think that that's good evidence that the problem is fixed.

vpelletier commented 5 years ago

When I ran it with the branch at first, I got this error:

Good catch, and the comment above that line is also a mistake (which shows that I initially did consider None and then forgot about it): that line was originally self.__transfer in self.__submitted_dict, except transfer is not a hashable type.

My code that previously segfaulted after about 12 hours on average has been running for a few days without problems. I think that that's good evidence that the problem is fixed.

Fantastic !

I pushed it to master and released 1.7.1 with it.

Thank you for the very efficient debugging & bug reporting, and sorry for my slow responses.