micro-manager / pymmcore

Python bindings for MMCore, Micro-Manager's device control layer
https://pypi.org/project/pymmcore/
GNU Lesser General Public License v2.1
32 stars 8 forks source link

`threads="1"` breaks `mmc.setSLMImage` #86

Closed hinderling closed 9 months ago

hinderling commented 9 months ago

Hi! I'm currently switching our setup from pycromanager to pymmcore(-plus).

As other users already noted, the update from 10.1.1.70.3 to 10.1.1.70.4 breaks mmc.setSLMImage('Mosaic3',mask). I tried with two different microscope setups, SLM devices are Andor Mosaic3 and Mightex Polygon1000, both crash with versions > 10.1.1.70.3. The offending change is the threads argument in %module (package="pymmcore", directors="1", threads="1") pymmcore_swig that was added to pymmcore/pymmcore_swig.i. I got mmc.setSLMImage working in a fork of the newest version (pymmcore 10.4.0.71.1, Micro-Manager_2.0.3_20230925), by changing the line and building from source.

But this re-introduces the kernel crashes as soon as mmc is idle for some time (~500ms), that probably would be fixed with 10.1.1.70.4: When I start taking pictures directly after loading the hardware config, I can change OCs and SLMImages for hundreds of pictures in a loop without issues. But as soon as the loop is done the kernel crashes. In the logs, the last line is always [IFO,Core] Did update system state cache.

Any ideas how to best continue from here?

tlambert03 commented 9 months ago

Ah yes, I remember seeing that issue but haven't looked closely at it yet. But now's a good time to dig in to it again!

I'll ping @ianhi because it looks like he's probably done the most digging into the issue so far. Here are some general thoughts

  1. threads="1" is definitely a pretty critical change to pymmcore. @ianhi's PR https://github.com/micro-manager/pymmcore/pull/50 made huge improvements to the performance and allowed us to ditch multi-processing as a workaround. So we definitely need to figure out how to keep that.
  2. linking related issue: https://github.com/micro-manager/pymmcore/issues/76
  3. similar issue in MMCoreJ: https://github.com/micro-manager/mmCoreAndDevices/issues/309

Did you happen to try @ianhi's mask2bytes conversion mentioned here?

pixeloff = '\x00'
pixelon = '\x01'
def mask2bytes( mask):
    flatarray = mask.ravel().tolist()  # flatten image array
    tostring = [pixelon if pixel > 0 else pixeloff for pixel in flatarray]
    mask = ''.join(tostring)
    return mask

# this line replicate your  error
core.setSLMImage('some-slm', mask)

# this line crashes my python because I haven't loaded any devices
# but critically seems to get past the first error
core.setSLMImage('some-slm', mask2bytes(mask))

I'm going to need a bit of time to try to reproduce this locally to see if I can help push things along in pymmcore itself. (I have one SLM but haven't tried to control it with MM yet) @marktsuchida points out that one issue is that hand-written SWIG wrapper for that method is not handling exceptions. Not sure if fixing that will solve your issue here, but it's something to look at.

thanks for opening here, we'll try to get this figured out.

hinderling commented 9 months ago

Yes! I used the mask2bytes conversion to get it running with 10.1.1.70.3 and the modified version with threads=1 removed. pixelon='\x01 worked for Mosaic3 and pixelon='\x7f' for Polygon1000.

I understand that keeping threads=1 is important - just wanted to make sure that this change is what breaks current versions for me.

Fixing the error handling #76 would definitely also help with trouble shooting, as loading the hardware configuration always takes ~30s.

Thank you so much for looking into this! Happy to help in any way!

tlambert03 commented 9 months ago

ok, so you need to use both mask2bytes and remove threads=1 to get it to work? got it. thanks for the info.

will take some digging through

  1. how SWIG enables multithreaded support: https://www.swig.org/Doc4.0/Python.html#Python_multithreaded
  2. exactly what CMMCore is doing with the SLM device adapter when setSLMImage is called. Starting with MMCore::setSLMImage -> SLMInstance::SetImage -> setImage in the device adapters themselves.

Unfortunately, both the Mosaic3 and Polygon1000 device adapters are closed source, so i can't immediately check to see whether any issues with threads might be caused inside of the device adapters themselves or in something higher up in the MMCore (maybe @marktsuchida would have a gut feeling about why threads=1 is specifically problematic here?)

tlambert03 commented 9 months ago

just to make sure I'm clear, you have two different SLM devices (Mosaic and mightex) and they both work as long as you use mask2bytes and remove threads=1? and both fail simply by setting threads=1?

ianhi commented 9 months ago

This is unfortunately at the far limit of my knowledge for this. (i actually didn't even write mask2bytes, that was in the OP of the image.sc thread).

I will say it's very odd that building with the threads option would cause a crash, to my understanding it basically adds safety by releasing the GIL when going into the c++ code.

One debugging step if you're really invested would be to rebuild mmCoreAndDevices and add some stdout prints to this funciton: https://github.com/micro-manager/mmCoreAndDevices/blob/6e8b23e95bf37ced2f52aee568ff91994d7bb43d/MMCore/MMCore.cpp#L5949 to see if you get all the way through there. Although it may not help much as you can't add to the closed source drivers.

Small note @tlambert03 for looking at swig docs. This is built using swig3.0 not 4.0 so those docs you linked may be slightly different than the behavior here. Although unfortunately it seems the equivalent threading docs don't exist for 3.0.

ianhi commented 9 months ago

But as soon as the loop is done the kernel crashes. In the logs, the last line is always [IFO,Core] Did update system state cache.

This sounds as though a low priority is spawned and taking images delays it from taking action, but as soon as you go idle eventually it runs and causes some fatal error.

The log message you see comes from the updateSystemStateCache() completing successfullly, so I htink we need to look for things that would happen right after that as a potential suspects. For example poking around this looks like a suspicious place (callbacks, threads etc):

https://github.com/micro-manager/mmCoreAndDevices/blob/6e8b23e95bf37ced2f52aee568ff91994d7bb43d/MMCore/MMCore.cpp#L7058-L7063

ianhi commented 9 months ago

@hinderling just to clarify: when does the error actually occur. From my reading it sounds as though it occurs soon after you load the devices, whether or not you call setSLMImage? And actually calling setSLMImage delays but neither prevents nor causes the error?

marktsuchida commented 9 months ago

I'm going to work on fixing #76 for a start. (Let me know if any of you have already begun work on that.)

As for why threads="1" causes a crash in setSLMImage() -- I don't have any immediate ideas, but I might be able to investigate if it reproduces with GenericSLM (which I can test with a regular display).

tlambert03 commented 9 months ago

I'm going to work on fixing https://github.com/micro-manager/pymmcore/issues/76 for a start.

thanks a bunch @marktsuchida !

GenericSLM (which I can test with a regular display).

oh, very nice tip

tlambert03 commented 9 months ago

@hinderling, as a side note, if it's definitely the case that one must always convert a numpy array before passing to setSLMImage, i would also be happy to accept a PR that checks if the argument to setSLMImage is an ArrayLike object and perform the conversion before calling super()

marktsuchida commented 9 months ago

76 has been fixed with #87. It might be good if you can reproduce the error with the latest main, in case anything changes.

Unfortunately GenericSLM does not seem to trigger any crash in my hands (before or after #87, except by using the wrong mask size before #87) so I am unable to reproduce this issue.

hinderling commented 9 months ago

Sorry, I try to clarify!

@hinderling just to clarify: when does the error actually occur. From my reading it sounds as though it occurs soon after you load the devices, whether or not you call setSLMImage? And actually calling setSLMImage delays but neither prevents nor causes the error?

Correct (for versions without threads=1)

if it's definitely the case that one must always convert a numpy array before passing to setSLMImage, i would also be happy to accept a PR that checks if the argument to setSLMImage is an ArrayLike object and perform the conversion before calling super()

I think this is the case, at least from the forum posts/repositories online. With pycromanager, np.arrays work on our system and others. I think it would be nice to automatically do the conversion, meaning have setSLMImage accept np.arrays. One issue is that the bytes need to be converted differently:

I think it would still improve ease of use if users can pass np.arrays, and look up online which which value represents an on-pixel for their hardware, what do you think? Example below.

76 has been fixed with #87. It might be good if you can reproduce the error with the latest main, in case anything changes.

It does change!!! Seems to run perfect now on Polygon1000!! Can run setSLMimage even with threads=1, no crashes when idle.

Was this all fixed by #87 , or am I missing something?

Also works with using chr(i) and supplying the correct numbers in an int array. Type of int does not matter:

def mask2bytes(mask):
    flatarray = mask.ravel().tolist()  # flatten image array
    mask = ''.join([chr(i) for i in flatarray])
    return mask

mask_on  = np.full((h, w),fill_value=127,dtype = int)
mask_off = np.full((h, w),fill_value=0,dtype = int)

for mask in [mask_on,mask_off]:
    mask_bytes = mask2bytes(mask)
    mmc.setSLMImage('MightexPolygon1000',mask_bytes)
    mmc.displaySLMImage('MightexPolygon1000')
    mmc.snapImage()

Will also try on the other scope with the Mosaic3. Let me know if there is anything specific you would like me to test.

tlambert03 commented 9 months ago

I think it would still improve ease of use if users can pass np.arrays, and look up online which which value represents an on-pixel for their hardware, what do you think? Example below.

definitely! could I entice you to open a PR for that and get you on the contributors here? :). We can chat about the exact API there, but I think just overloading setSLMImage to also accept a np.ndarray would be good, with an optional argument to provide an explicit pixelon value, where we do our best to look it up from the current device if not provided.

Seems to run perfect now on Polygon1000!! Can run setSLMimage even with threads=1, no crashes when idle.

that's super 🎉

hinderling commented 9 months ago

could I entice you to open a PR for that and get you on the contributors here? :)

Would love to try

just overloading setSLMImage to also accept a np.ndarray would be good, with an optional argument to provide an explicit pixelon value, where we do our best to look it up from the current device if not provided.

Sounds good!!

Just tested the current main #87 on our other setup and it also runs perfectly! Thank you so much everyone for the incredibly quick help!!

marktsuchida commented 9 months ago

Great. The change in #87 not only propagates exceptions correctly, but also removes some Python C API calls that were incorrect and totally out of place, so that might have been the cause of the crash.

hinderling commented 9 months ago

Got a prototype working with np.arrays! My idea was:

Code allows you to run:

#full ON (127 is max value for this DMD)
mmc.setSLMImage('DMD', np.full((1140,912), 127).astype(np.uint8))
mmc.setSLMImage('DMD',np.ones((1140,912)).astype(bool),127)

#full OFF
mmc.setSLMImage('DMD', np.full((1140,912), 0).astype(np.uint8))
mmc.setSLMImage('DMD',np.zeros((1140,912)).astype(bool),127)

Passing np.arrays also allows for nice error messages regarding dimensions, e.g.:

oss << "Image dimensions are wrong for this SLM. Expected (" << expectedHeight << ", " << expectedWidth << "), but received (" << dims[0] << ", " << dims[1] << ")";
throw CMMError(oss.str());

Questions:

Next things I will be working on:

Also happy to take the discussion to another space!

tlambert03 commented 9 months ago

Got a prototype working with np.arrays!

oh that's awesome! I actually had originally meant to modify the python code here, but this is way better 😍. (assuming @marktsuchida would be happy with including this on the SWIG side?)

Should I keep option to pass char * directly from python, or break it and allow np.arrays only?

i would say it's important not keep char * as well and not break any existing applications

Not sure if there are other bit-depths that would need to be accounted for?

I also don't know, but i think it's safe enough to assume that people can do the appropriate casting themselves, for now at least)

Also happy to take the discussion to another space!

yep, inasmuch as this is editing pymmcore directly, i think we should move the discussion there.

marktsuchida commented 9 months ago

Yeah, that's nice. Would be great to include it here. Would be nice to keep the char * for compat, but SWIG can wrap overloaded C++ functions, so hopefully it is just a matter of duplicating the function?

It will be good to make sure that the RGB samples end up in the expected order (and with the added 4th channel not erroneously being rendered as one of the samples). The only device in our codebase that actually supports RGB images is GenericSLM (I checked the code for all 5 uses of CSLMDevice, including the 2 closed-source ones), so this should be easy to test.

MightexPolygon seems to convert any u32 image passed to it to a grayscale image, so it might be good to make sure nothing goes wrong there (though presumably users would want to be using a grayscale mask with these devices, to have full control). The other 3 devices do not accept u32 images.

hinderling commented 9 months ago

Quick update and some questions! Got the following working with GenericSLM:

#init
pixel_on_value = 255
dmd_name = 'GenericSLM'

h = mmc.getSLMHeight(dmd_name)
w = mmc.getSLMWidth(dmd_name)

#set RGB with np.arrays
rgb = np.zeros((h,w,3)).astype(np.uint8)
rgb[:,:,0] = 255 #Red image
mmc.setSLMImage(dmd_name,rgb)
mmc.displaySLMImage(dmd_name)

rgb = np.zeros((h,w,3)).astype(np.uint8)
rgb[:,:,2] = 255 #Blue image
mmc.setSLMImage(dmd_name,rgb)
mmc.displaySLMImage(dmd_name)

#set with BOOL  np.arrays 
mmc.setSLMImage(dmd_name,np.zeros((h,w)).astype(bool)) #black image
mmc.displaySLMImage(dmd_name)
mmc.setSLMImage(dmd_name,np.ones((h,w)).astype(bool)) #white image
mmc.displaySLMImage(dmd_name)

#set with np.arrays uint8 
mmc.setSLMImage(dmd_name,np.full((h,w),0).astype(np.uint8)) #black image
mmc.displaySLMImage(dmd_name)
mmc.setSLMImage(dmd_name,np.full((h,w),pixel_on_value).astype(np.uint8)) #white image
mmc.displaySLMImage(dmd_name)

# With char*
pixeloff = '\x00'
pixelon = '\x7f'

def mask2bytes( mask):
    flatarray = mask.ravel().tolist()  # flatten image array 
    tostring = [pixelon if pixel > 0 else pixeloff for pixel in flatarray]
    mask = ''.join(tostring)
    return mask
mask =  mask2bytes(np.full((h,w),0))
mmc.setSLMImage(dmd_name,mask)
mmc.displaySLMImage(dmd_name)

# set with np.arrays uint32 
# (TODO: Check if it works on Polygon. Leads to unexpected behaviour with GenericSLM)
mmc.setSLMImage(dmd_name,np.full((h,w),pixel_on_value).astype(np.uint32))
mmc.displaySLMImage(dmd_name)

Code can be found here.

Questions:

  1. How to handle the u32 grayscale images that work for MightexPolygon for other devices? Throw an error?
  2. Is it possible to build GenericSLM for MacOS?
  3. Is there a way to automatically figure out the pixel_on_value? Store it for all known devices? Would be nice if the same code can be run on Mosaic and Polygon without having to manually change pixel_on_value.

Other TODOs:

tlambert03 commented 9 months ago

Thanks as always for the helpful/organized post :) I'm away at a conference this week, @marktsuchida might have answers in the meantime.

marktsuchida commented 9 months ago
  1. How to handle the u32 grayscale images that work for MightexPolygon for other devices? Throw an error?

Does the Polygon actually produce more than 256 gray levels? (I couldn't find the bit depth in their spec sheet.)

As I wrote above, from the code it appears that u32 is treated as RGB and converted to 8-bit(?) gray, so I'm wondering if the fact that it even accepts u32 images at all is the bug.

All other SLM adapters (except GenericSLM) return "not implemented", resulting in an exception, which makes sense (if you don't get an exception, we'll need to fix that). What we could/should add is a method to query the supported pixel formats.

  1. Is it possible to build GenericSLM for MacOS?

The code uses Win32 API functions directly, so unfortunately no. We'd have to figure out how to draw to a monitor on macOS.

  1. Is there a way to automatically figure out the pixel_on_value? Store it for all known devices? Would be nice if the same code can be run on Mosaic and Polygon without having to manually change pixel_on_value.

I suppose each device should be made capable of reporting its actual bit depth (as opposed to the "8 bits" which is how the samples are encoded). In theory, there could be SLMs that only support 6 or 7 bits, or those that support 12 bits stored as 16-bit elements.

In your above example, you use 255 for the ndarray case and 127 for the char * case. My guess is that the only reason values beyond 127 don't work in the latter case is that the bytes have to go through a Python (Unicode) string (due to how SWIG wraps it), meaning that there is no way to specify arbitrary bytes (values in 0-127 work because they all map to valid ASCII characters). I didn't realize until now that this is what was going on; the fix you are working on is critical for SLMs to be fully usable through pymmcore.

Python bytes objects could represent values 0-255, and can be obtained easily via ndarray.asbytes(), but unfortunately the SWIG wrapper as it stands does not allow passing bytes for the char *. This can be changed (SWIG_PYTHON_STRICT_BYTE_CHAR, assuming it can be applied to a single function; or perhaps we can change the function to take const unsigned char *), but I don't think it is important if we support ndarray directly.

Unfortunately I have neither the hardware nor the documentation for most of these devices, so I'm interested in what you can find out about the meaningful (or valid) value range for the different devices you have access to (and what happens when you exceed the range). But I would not base any of this on the behavior when setting the image via char *. If it works for 0-255 using an ndarray, that is what matters.

hinderling commented 9 months ago