Closed jamesobutler closed 2 years ago
I've attempted to modify some recent changes (https://github.com/IGSIO/OpenIGTLinkIO/commit/207f5cd7c7234d83cf53e674de8753243221d825 and https://github.com/openigtlink/SlicerOpenIGTLink/commit/0df19d0cb97a3aeb5ec9bea5d7848b99f1a6f23c) to use std::recursive_mutex instead of std::mutex, but I'm still running into this issue.
I followed the steps on the latest nightly versions of Plus (connected to an Arduino) + Slicer, but the application error has not come up yet.
hmmm I can try with the latest nightly versions of Plus instead of my wrapper application. My wrapper application does not have the latest OpenIGTLink/OpenIGTLinkIO. Would you expect that the version of OpenIGTLinkIO used by Plus needs to match with the OpenIGTLinkIO version used by SlicerOpenIGTLink?
Plus + Slicer shouldn't need to run the same version of OpenIGTLinkIO.
Ok. I confirmed with the latest Plus download from the web that I couldn't replicate with my instructions above. I'll try narrowing it down further based on different versions of toolkits in my wrapper version of Plus to see what might be causing the incompatibility.
@Sunderlandkyl I was able to replicate with the latest Plus downloaded from the web. I think I had changed some code elsewhere to making blocking=True
the default. I'm going to update my instructions in step 3 from self.sendSetRTSCommand("True")
to self.sendSetRTSCommand("True", blocking=True)
Below is an image of me at step 8 observing the Application error.
Yup. that did it, I can replicate the issue now.
Found the issue. Fix commited in OpenIGTLinkIO: https://github.com/IGSIO/OpenIGTLinkIO/commit/e38a0e29c0e8cbc1ef860cb3b0444c7b177da234.
The Connection event was invoked during the PeriodicProcess function on the Main thread. While events are being handled, the EventQueueMutex is locked.
If a blocking command is sent in response to some event that was invoked during PeriodicProcess, then the SendCommand function will call also PeriodicProcess in an attempt to resolve the command before returning control of the thread.
This second PeriodicProcess will again attempt to handle the events and lock EventQueueMutex. Since it is already locked, this results in a deadlock.
Changing EventQueueMutex from std::mutex to std::recursive_mutex resolves the issue, and it seems safe enough to be able to access EventQueue multiple times on the same thread.
Changing EventQueueMutex from std::mutex to std::recursive_mutex resolves the issue
We should use recursive mutex everywhere, as we want to allow a thread (once it acquired that mutex) to freely call any number of mutex-protected methods in any order at any level. Non-recursive std::mutex should practically never be used (only very carefully in very special cases where the locking scope is very simple and limited and performance is critical).
@Sunderlandkyl @jamesobutler this was a very nice collaborative effort. Multithreading issues are notoriously hard to reproduce and fix.
We should use recursive mutex everywhere, as we want to allow a thread (once it acquired that mutex) to freely call any number of mutex-protected methods in any order at any level. Non-recursive std::mutex should practically never be used (only very carefully in very special cases where the locking scope is very simple and limited and performance is critical).
Makes sense, change made in https://github.com/IGSIO/OpenIGTLinkIO/commit/46975d197796063b956573f1b1022ac2e3643fe4.
Thanks for the pushed updates @Sunderlandkyl! Unfortunately our office building and surrounding area lost power, so I'm going to test this tomorrow morning.
I am no longer facing this issue now 👍🏻
Background
Originally mentioned in https://github.com/openigtlink/SlicerOpenIGTLink/pull/125#pullrequestreview-1162159907, but I have now found a set of steps for someone to easily replicate. cc: @Sunderlandkyl @lassoan
Steps to reproduce:
onConnected
signal callsself.sendSetRTSCommand("True", blocking=True)
, https://github.com/openigtlink/SlicerOpenIGTLink/blob/af9659f2605bafff285e1af7d1b017d717a35565/GenericSerialDeviceRemoteControl/GenericSerialDeviceRemoteControl.py#L341-L347Video recreation of steps to reproduce:
https://user-images.githubusercontent.com/15837524/199614588-628cdb4c-9b62-4374-8f13-0587184f3337.mp4
Environment