labsyspharm / mcmicro

Multiple-choice microscopy pipeline
https://mcmicro.org/
MIT License
113 stars 58 forks source link

specs for running on a workstation #558

Open bmyury opened 3 months ago

bmyury commented 3 months ago

Could you please formulate the specs for MCMICRO to be runnable on a standalone workstation. We have expect about Pb of data to be processed which is unfeasible to transfer to the cloud.

ArtemSokolov commented 3 months ago

@jmuhlich I think you have a script that calculates the required RAM for each module based on image size. Can you share the formula that you used?

jmuhlich commented 3 months ago

Code is here: https://github.com/labsyspharm/mcmicro-lsp/blob/main/o2/config_pre_reg.sh

The key input is channel_gpx which is the number of gigapixels in one channel of one cycle. The script has logic to roughly estimate that from rcpnl files but for other formats you can calculate the image dimensions yourself and plug that in — Width x Height / 1,000,000,000 (do not round).

Note there are different constants for the linear equations for the segmentation step if unmicst is configured to downsample higher-res raw data or the -large version of S3 segmenter is used.

bmyury commented 3 months ago

the size of a single image in our data is 20Gb (.qptif) – but we have tons of experiments to process. I suppose the script you indicate does not directly explain how to run MCMICRO on a single windows workstation – that is what we are looking for

ArtemSokolov commented 3 months ago

Hi @bmyury,

To run MCMICRO, all you need is Docker and Nextflow. On a Windows machine, the easiest way to go is probably WSL:

The limiting factor for running the pipeline is usually RAM. Jeremy's script will estimate the amount of RAM needed by each process from the image dimensions (not file size).

bmyury commented 2 months ago

Thank you so much Artem.