micro-manager / pymmcore

Python bindings for MMCore, Micro-Manager's device control layer
https://pypi.org/project/pymmcore/
GNU Lesser General Public License v2.1
33 stars 8 forks source link

callback system crashing kernel with subclassed `MMEventCallback` #49

Closed ianhi closed 3 years ago

ianhi commented 3 years ago

Hi,

I'm using an Olympus IX83 microscope with pymmcore-plus. In https://github.com/tlambert03/pymmcore-plus/issues/21 I found what I thought were issues with pymmcore-plus but on further inspection seem to be issues with pymmcore itself (or perhaps the secret olympus driver).

On this microscope if I register a pure pymmcore.MMEventCallback and then do setConfig then there is no issue and it prints out all the callbacks to the terminal (Though there is some multiple calling of callbacks perhaps) (These prints come from here: https://github.com/micro-manager/mmCoreAndDevices/blob/97f82b9ef02788d8b6e9aa70283eb10c7c645a6e/MMCore/MMEventCallback.h)

onPropertyChanged() Core Shutter Shutter-1
onPropertyChanged() DiaShutter State 1
onConfigGroupChanged() Channel BF
onPropertyChanged() TransmittedIllumination 1 Brightness 100 
onConfigGroupChanged() Channel BF
onPropertyChanged() EpiShutter 1 State 0
onConfigGroupChanged() Channel BF 

However if I switch to using a subclass of that object that doesn't introduce any changes then my python immediately crashes with no error messages printed anywhere I can see:

class MyCallbacks(pymmcore.MMEventCallback):
    def __init__(self):
        super().__init__()

# cb = pymmcore.MMEventCallback()
cb = MyCallbacks()
mmcore.registerCallback(cb)

I only see this crash when changing the config, if I do something like set the X position then the callback works the same as the pymmcore version.

This is deeply confusing because I thought that subclassing an object like that should be exactly the same as having the same object. However in https://github.com/tlambert03/pymmcore-plus/issues/21#issuecomment-881701240 it was suggested that this may be an incompatibiliity due to nuances with swig:

it's possible that the callback system is subtly incompatible with the SwigPyObject wrapping pymmcore.MMEventCallback (see here for a bit more, and note that pymmcore currently uses the -builtin flag) but that's just a guess.

Any help getting to the bottom of this issue would be greatly appreciated.

ianhi commented 3 years ago

It is possible that this could be related to https://forum.image.sc/t/micro-manager-hangs-on-loading-olympus-ix83/48689/4 but I would still find the difference between the raw object and an identical subclass worrisome.

marktsuchida commented 3 years ago

It does sound like this is an issue in the SWIG wrapper generated by pymmcore.

The OlympusIX83 is likely (perhaps certain) to invoke these callbacks on a thread internal to the device adapter, which might be a difference from the other devices you tried.

It might be possible to simulate this by setting a DemoCamera property from a Python Thread, although it's hard to predict if that will reproduce the situation needed to trigger the crash. MMCore is supposed to be thread-safe (internally, there is a per-device-adapter lock).

Does the crash occur when you set the config directly, or when you set a property that indirectly causes the config to change, or both? Also, does the crash occur when you set the config or properties from code, or when you manually change it on the scope stand, or both?

ianhi commented 3 years ago

The OlympusIX83 is likely (perhaps certain) to invoke these callbacks on a thread internal to the device adapter, which might be a difference from the other devices you tried.

On this point it may be relevant that the crash doesn't always (or perhaps ever?) happen instantly, it often seems fine for 2-5 seconds and then crashes. i.e. just long enough for me to get my hopes up in order to crush them :(

Does the crash occur when you set the config directly, or when you set a property that indirectly causes the config to change, or both?

At the end I was getting a crash with setConfig('Channel', 'BF') when the relevant config group was only setting the dichroic position:

# Group: Channel
# Preset: YFP
ConfigGroup,Channel,YFP,Dichroic-1,Label,YFP

# Preset: GFP
ConfigGroup,Channel,GFP,Dichroic-1,Label,GFP

# Preset: BF
ConfigGroup,Channel,BF,Dichroic-1,Label,1-NONE

Also, does the crash occur when you set the config or properties from code, or when you manually change it on the scope stand, or both?

I have a hazy memory of setting a shutter on the device hub and then seeing a crash but I can try to confirm this if it'd be useful for debugging.


I saw that the device driver in question is closed source which is a bummer for debugging this. Are you at least able to look at this for hints, or if not would it be worth emailing the author/someone at olympus?

ianhi commented 3 years ago

invoke these callbacks on a thread internal to the device adapter, which might be a difference from the other devices you tried.

Indeed, the other scope i've tried was the nikon TE2000 which did not have this issue and looking at https://github.com/micro-manager/mmCoreAndDevices/tree/main/DeviceAdapters/NikonTE2000 it does not seem to use threads.

ianhi commented 3 years ago

Setting demo camera properties from a python thread seems to be fine. Though looking at https://github.com/micro-manager/mmCoreAndDevices/blob/main/DeviceAdapters/DemoCamera/DemoCamera.cpp there are also no threads and I think that not everything that should send a callback does. For example setting the channel doesn't emit any propertyChanged signals so I had to use exposure to check this.

import logging
import threading
import time
import pymmcore

mmcore = pymmcore.CMMCore()

mm_dir = "/usr/local/lib/micro-manager"
mmcore.setDeviceAdapterSearchPaths([mm_dir])
mmcore.loadSystemConfiguration("MMConfig_demo.cfg")

class MyCallbacks(pymmcore.MMEventCallback):
    def __init__(self):
        super().__init__()

cb = MyCallbacks()
# cb = pymmcore.MMEventCallback()
mmcore.registerCallback(cb)

time.sleep(0.5)
# not callbacks from these
mmcore.setConfig("Channel", "FITC")
mmcore.setConfig("Channel", "DAPI")

# gives a callback - no crash in main thread
mmcore.setExposure(29)

def thread_function(name):
    logging.info("Thread %s: starting", name)
    # change twice to ensure we trigger a changed callback
    time.sleep(0.5)
    mmcore.setConfig("Channel", "FITC")
    time.sleep(0.5)
    mmcore.setExposure(31)
    mmcore.setConfig("Channel", "DAPI")

format = "%(asctime)s: %(message)s"
logging.basicConfig(format=format, level=logging.INFO, datefmt="%H:%M:%S")
x = threading.Thread(target=thread_function, args=(1,))
x.start()

gives as output:

onExposureChanged()Camera 29
10:30:40: Thread 1: starting
onExposureChanged()Camera 31
ianhi commented 3 years ago

pymmcore is not built with the -threads option right? Reading http://www.swig.org/Doc4.0/Python.html#Python_multithreaded

More specifically, the Python wrappers generated by SWIG will not release the Python's interpreter's Global Interpreter Lock (GIL) when wrapped C/C++ code is entered.

it seems that there is potential for issue?

Also from https://github.com/swig/swig/issues/927

I found that this "-threads" option was essential for expected multithreaded behavior when using SWIG-wrapped C++ code in a multithreaded Python application.

I also found https://github.com/FLAMEGPU/FLAMEGPU2/issues/432 which is first relevant issue that talks about callbacks

From that issue:

When executing models as part of an ensemble using Python, it is necessary for call back functions to lock the python GIL before executing call backs in anything other than the default thread. Without this, there is a high chance of receiving an fatal access violation.

The easy fix for this is to pass -threads to SWIG, which will automatically lock/release the GIL throughout swig generated code. However, it's suggested that this is very aggressive, hence it may impact performance.

And finally:

https://stackoverflow.com/a/11276444/835607

However, if your C++ code defers the callback to another thread, then you very likely have violated the GIL.

marktsuchida commented 3 years ago

Good catch! I'm surprised this never came up before -- I had assumed that SWIG always releases the GIL, like ctypes. We need to enable SWIG thread support whether or not it is sufficient to fix this issue.