vanvalenlab / deepcell-applications

Run DeepCell Applications
Other
8 stars 6 forks source link

Changes to membrane-image definition #31

Open FloWuenne opened 2 years ago

FloWuenne commented 2 years ago

I just want to make sure this gets seen, so I opened a new issue, since its not completely related to the previous one.

As mentioned in #13 , I would like the way that --membrane-image is defined and used by default to be adjusted.

Thanks for considering this!

Cheers

ngreenwald commented 2 years ago

Hey @FloWuenne, is this a convenience change or a feature change? From my understanding, you can do what you're suggesting already by passing the same path to the nuclear and membrane channel. Is your proposed change so that the use only has to pass a single file path if they have a single multichannel image?

FloWuenne commented 2 years ago

Thanks for the reply @ngreenwald !

Exactly, the proposed change is to be able to only pass a single, multichannel image to mesmer and just specify the nuclear and membrane channels. This has two main reasons: 1) File channeling in MCMICRO / nextflow generally expects 1 multichannel image file for other segmentation methods and thus would be more in line with this. 2) Having only one multi-channel file and using different channels seems (at least to me) more in line with the whole idea of multiplexed images as well. Also, if unknowing users try and pass large, multichannel images to both nuclear and membrane channels, they might easily encounter out-of-memory issues since both images have to be read...

I will tag @ArtemSokolov here, as he is the one mainly implementing Mesmer in MCMICRO and he can add any other reasons I might have missed.

ArtemSokolov commented 2 years ago

I think the main concern is the same image getting loaded twice, thus doubling the memory footprint (which would be problematic for very large whole-slide images).

ngreenwald commented 2 years ago

Got it, that makes sense. We have a pretty big backlog here, but if you guys want to fork the repo and open a PR, I'd be happy to review it.

The loading logic is here: https://github.com/vanvalenlab/deepcell-applications/blob/master/deepcell_applications/prepare.py#L41

marcovarrone commented 1 year ago

From the way it's currently implemented, since it's already creating an array of zeros with the same dtype as the nuclear image, the memory footprint should not change. Or am I missing something?

FloWuenne commented 1 year ago

I think as @ArtemSokolov pointed out, the main concern is that if the nuclear image and the membrane image are both part of multi-channel, large image, the current implementation would read both stacks in memory completely via get_image() before extracting the channel for segmentation. This is pretty memory inefficient.

One solution could be to only read the required channel into memory from the start (tricky part might be to know beforehand which axis is the channel). I am sorry I didn't have time to play around with this yet. We have put a workaround into MCMICRO that extract the required channel before applying Mesmer.

ngreenwald commented 1 year ago

I suspect that @FloWuenne's workaround is actually better.

Currently, if the image data is stored as a single 40-channel image, the current pipeline will read the entire thing into memory twice, once for each compartment.

The proposed fix in this issue would reduce this to a single read of the entire image, which is half the footprint. However, the ideal scenario would be to only read the relevant channels from the multitiff, potentially reducing memory by 90% for large images.

Only reading the relevant tiffs from the stack would require a bit more work, but subsetting in advance and just passing that to Mesmer will likely result in the largest savings.

Is that an accurate summary of where things stand?

FloWuenne commented 7 months ago

Sorry for only responding now @ngreenwald . You summarized it well. If you are planning to implement this at some point, keep this issue open, otherwise please feel free to close it. I unfortunately don't have bandwith to make a PR to add this improvement!