openigtlink / SlicerOpenIGTLink

OpenIGTLinkIF module as an Slicer Extension
Other
17 stars 28 forks source link

Slicer Application Error on connected event #127

Closed jamesobutler closed 2 years ago

jamesobutler commented 2 years ago

Background

Originally mentioned in https://github.com/openigtlink/SlicerOpenIGTLink/pull/125#pullrequestreview-1162159907, but I have now found a set of steps for someone to easily replicate. cc: @Sunderlandkyl @lassoan

image

Steps to reproduce:

  1. Using Windows platform, Download and install latest Slicer preview from 2022-11-02 https://slicer.cdash.org/viewFiles.php?buildid=2844075
  2. Install SlicerOpenIGTLink extension (as of https://github.com/openigtlink/SlicerOpenIGTLink/commit/af9659f2605bafff285e1af7d1b017d717a35565).
  3. Modify the GenericSerialDeviceRemoteControl.py file so that the onConnected signal calls self.sendSetRTSCommand("True", blocking=True), https://github.com/openigtlink/SlicerOpenIGTLink/blob/af9659f2605bafff285e1af7d1b017d717a35565/GenericSerialDeviceRemoteControl/GenericSerialDeviceRemoteControl.py#L341-L347
  4. Launch a PlusServer instance for a generic serial device command communication. (see example config xml below). Note that I have a wrapper version of PlusServer that has capabilities to accept custom commands including "quit" to end the PlusServer as you can see in the video recreation at the bottom of this list)
  5. From 3D Slicer, go to "OpenIGTLinkIF" module and add connector at the 18961 port and set it active
  6. go to "Serial Device Remote Control" module and select device ID "MySerial"
  7. End PlusServer instance and Launch PlusServer instance again
  8. Observe the Slicer Application Error pop-up
    <PlusConfiguration>
    <DataCollection StartupDelaySec="1.0">
    <Device
      BaudRate="9600"
      Id="MySerial"
      LineEnding="0d 0a"
      MaximumReplyDelaySec="0.1"
      MaximumReplyDurationSec="0.1"
      SerialPort="12"
      Type="GenericSerialDevice"
      >
      <DataSources>
        <DataSource Id="SerialData" Type="FieldData"/>
      </DataSources>
      <OutputChannels>
        <OutputChannel Id="FieldChannel">
          <DataSource Id="SerialData"/>
        </OutputChannel>
      </OutputChannels>
    </Device>
    </DataCollection>
    <PlusOpenIGTLinkServer
    MaxNumberOfIgtlMessagesToSend="1"
    MaxTimeSpentWithProcessingMs="50"
    ListeningPort="18961"
    LogWarningOnNoDataAvailable="false"
    OutputChannelId="FieldChannel" >
    <DefaultClientInfo>
      <MessageTypes>
        <Message Type="STRING" />
      </MessageTypes>
      <StringNames>
        <String Name="SerialData" />
      </StringNames>
    </DefaultClientInfo>
    </PlusOpenIGTLinkServer>
    </PlusConfiguration>

Video recreation of steps to reproduce:

https://user-images.githubusercontent.com/15837524/199614588-628cdb4c-9b62-4374-8f13-0587184f3337.mp4

Environment

jamesobutler commented 2 years ago

I've attempted to modify some recent changes (https://github.com/IGSIO/OpenIGTLinkIO/commit/207f5cd7c7234d83cf53e674de8753243221d825 and https://github.com/openigtlink/SlicerOpenIGTLink/commit/0df19d0cb97a3aeb5ec9bea5d7848b99f1a6f23c) to use std::recursive_mutex instead of std::mutex, but I'm still running into this issue.

Sunderlandkyl commented 2 years ago

I followed the steps on the latest nightly versions of Plus (connected to an Arduino) + Slicer, but the application error has not come up yet.

jamesobutler commented 2 years ago

hmmm I can try with the latest nightly versions of Plus instead of my wrapper application. My wrapper application does not have the latest OpenIGTLink/OpenIGTLinkIO. Would you expect that the version of OpenIGTLinkIO used by Plus needs to match with the OpenIGTLinkIO version used by SlicerOpenIGTLink?

Sunderlandkyl commented 2 years ago

Plus + Slicer shouldn't need to run the same version of OpenIGTLinkIO.

jamesobutler commented 2 years ago

Ok. I confirmed with the latest Plus download from the web that I couldn't replicate with my instructions above. I'll try narrowing it down further based on different versions of toolkits in my wrapper version of Plus to see what might be causing the incompatibility.

jamesobutler commented 2 years ago

@Sunderlandkyl I was able to replicate with the latest Plus downloaded from the web. I think I had changed some code elsewhere to making blocking=True the default. I'm going to update my instructions in step 3 from self.sendSetRTSCommand("True") to self.sendSetRTSCommand("True", blocking=True)

Below is an image of me at step 8 observing the Application error. image

Sunderlandkyl commented 2 years ago

Yup. that did it, I can replicate the issue now.

Sunderlandkyl commented 2 years ago

Found the issue. Fix commited in OpenIGTLinkIO: https://github.com/IGSIO/OpenIGTLinkIO/commit/e38a0e29c0e8cbc1ef860cb3b0444c7b177da234.

The Connection event was invoked during the PeriodicProcess function on the Main thread. While events are being handled, the EventQueueMutex is locked.

If a blocking command is sent in response to some event that was invoked during PeriodicProcess, then the SendCommand function will call also PeriodicProcess in an attempt to resolve the command before returning control of the thread.

This second PeriodicProcess will again attempt to handle the events and lock EventQueueMutex. Since it is already locked, this results in a deadlock.

Changing EventQueueMutex from std::mutex to std::recursive_mutex resolves the issue, and it seems safe enough to be able to access EventQueue multiple times on the same thread.

Stack trace ``` > igtlioLogic.dll!igtlioConnector::ImportEventsFromEventBuffer() Line 781 C++ igtlioLogic.dll!igtlioConnector::PeriodicProcess() Line 826 C++ igtlioLogic.dll!igtlioConnector::SendCommand(vtkSmartPointer command) Line 902 C++ vtkSlicerOpenIGTLinkIFModuleMRML.dll!vtkMRMLIGTLConnectorNode::vtkInternal::SendCommand(vtkSmartPointer command) Line 1095 C++ vtkSlicerOpenIGTLinkIFModuleMRML.dll!vtkMRMLIGTLConnectorNode::SendCommand(vtkSmartPointer command) Line 2433 C++ vtkSlicerOpenIGTLinkIFModuleMRML.dll!vtkMRMLIGTLConnectorNode::SendCommand(vtkSlicerOpenIGTLinkCommand * command) Line 2440 C++ vtkSlicerOpenIGTLinkIFModuleMRMLPython.pyd!PyvtkMRMLIGTLConnectorNode_SendCommand(_object * self, _object * args) Line 1548 C++ [External Code] vtkWrappingPythonCore3.9-9.1.dll!vtkPythonCommand::Execute(vtkObject * ptr, unsigned long eventtype, void * callData) Line 237 C++ vtkCommon-9.1.dll!vtkSubjectHelper::InvokeEvent(unsigned long event, void * callData, vtkObject * self) Line 651 C++ vtkSlicerOpenIGTLinkIFModuleMRML.dll!vtkMRMLIGTLConnectorNode::ProcessIOConnectorEvents(vtkObject * caller, unsigned long event, void * callData) Line 1407 C++ vtkSlicerOpenIGTLinkIFModuleMRML.dll!vtkObject::vtkClassMemberCallback::operator()(vtkObject * caller, unsigned long event, void * calldata) Line 381 C++ vtkCommon-9.1.dll!vtkObjectCommandInternal::Execute(vtkObject * caller, unsigned long eventId, void * callData) Line 954 C++ vtkCommon-9.1.dll!vtkSubjectHelper::InvokeEvent(unsigned long event, void * callData, vtkObject * self) Line 651 C++ [Inline Frame] igtlioLogic.dll!vtkObject::InvokeEvent(unsigned long) Line 230 C++ igtlioLogic.dll!igtlioConnector::ImportEventsFromEventBuffer() Line 789 C++ igtlioLogic.dll!igtlioConnector::PeriodicProcess() Line 826 C++ vtkSlicerOpenIGTLinkIFModuleMRML.dll!vtkMRMLIGTLConnectorNode::PeriodicProcess() Line 2506 C++ vtkSlicerOpenIGTLinkIFModuleLogic.dll!vtkSlicerOpenIGTLinkIFLogic::CallConnectorTimerHander() Line 329 C++ Qt5Core.dll!doActivate<0>(QObject * sender, int signal_index, void * * argv) Line 3903 C++ [Inline Frame] Qt5Core.dll!QTimer::timeout(QTimer::QPrivateSignal _t1) Line 205 C++ Qt5Core.dll!QTimer::timerEvent(QTimerEvent * e) Line 259 C++ Qt5Core.dll!QObject::event(QEvent * e) Line 1341 C++ Qt5Widgets.dll!QApplicationPrivate::notify_helper(QObject * receiver, QEvent * e) Line 3634 C++ Qt5Widgets.dll!QApplication::notify(QObject * receiver, QEvent * e) Line 3586 C++ qSlicerBaseQTGUI.dll!qSlicerApplication::notify(QObject * receiver, QEvent * event) Line 452 C++ Qt5Core.dll!QCoreApplication::notifyInternal2(QObject * receiver, QEvent * event) Line 1063 C++ Qt5Core.dll!QEventDispatcherWin32Private::sendTimerEvent(int timerId) Line 442 C++ Qt5Core.dll!QEventDispatcherWin32::event(QEvent * e) Line 1067 C++ Qt5Widgets.dll!QApplicationPrivate::notify_helper(QObject * receiver, QEvent * e) Line 3634 C++ Qt5Widgets.dll!QApplication::notify(QObject * receiver, QEvent * e) Line 3586 C++ qSlicerBaseQTGUI.dll!qSlicerApplication::notify(QObject * receiver, QEvent * event) Line 452 C++ Qt5Core.dll!QCoreApplication::notifyInternal2(QObject * receiver, QEvent * event) Line 1063 C++ [Inline Frame] Qt5Core.dll!QCoreApplication::sendEvent(QObject *) Line 1458 C++ Qt5Core.dll!QCoreApplicationPrivate::sendPostedEvents(QObject * receiver, int event_type, QThreadData * data) Line 1817 C++ qwindows.dll!QWindowsGuiEventDispatcher::sendPostedEvents() Line 81 C++ Qt5Core.dll!QEventDispatcherWin32::processEvents(QFlags flags) Line 530 C++ qwindows.dll!QWindowsGuiEventDispatcher::processEvents(QFlags flags) Line 74 C++ [Inline Frame] Qt5Core.dll!QEventLoop::processEvents(QFlags) Line 139 C++ Qt5Core.dll!QEventLoop::exec(QFlags flags) Line 232 C++ Qt5Core.dll!QCoreApplication::exec() Line 1371 C++ qSlicerBaseQTCore.dll!qSlicerCoreApplication::exec() Line 969 C++ SlicerApp-real.exe!`anonymous namespace'::SlicerAppMain(int argc, char * * argv) Line 62 C++ [External Code] ```
lassoan commented 2 years ago

Changing EventQueueMutex from std::mutex to std::recursive_mutex resolves the issue

We should use recursive mutex everywhere, as we want to allow a thread (once it acquired that mutex) to freely call any number of mutex-protected methods in any order at any level. Non-recursive std::mutex should practically never be used (only very carefully in very special cases where the locking scope is very simple and limited and performance is critical).

lassoan commented 2 years ago

@Sunderlandkyl @jamesobutler this was a very nice collaborative effort. Multithreading issues are notoriously hard to reproduce and fix.

Sunderlandkyl commented 2 years ago

We should use recursive mutex everywhere, as we want to allow a thread (once it acquired that mutex) to freely call any number of mutex-protected methods in any order at any level. Non-recursive std::mutex should practically never be used (only very carefully in very special cases where the locking scope is very simple and limited and performance is critical).

Makes sense, change made in https://github.com/IGSIO/OpenIGTLinkIO/commit/46975d197796063b956573f1b1022ac2e3643fe4.

jamesobutler commented 2 years ago

Thanks for the pushed updates @Sunderlandkyl! Unfortunately our office building and surrounding area lost power, so I'm going to test this tomorrow morning.

jamesobutler commented 2 years ago

I am no longer facing this issue now 👍🏻