Closed andrea-de-micheli closed 1 month ago
In your methods.py file, line 43: model = models.CellposeModel(model_type=model_type, pretrained_model=pretrained_model)
would it be possible to add the option for gpu? as defined: https://cellpose.readthedocs.io/en/latest/api.html#cellposemodel
and have this as an additional kwargs for CLI?
Hello @andrea-de-micheli,
TL;DR: I suggest using the Snakemake pipeline, which will do all the parallelization and the boring stuff for you. Using the CLI is recommended if you want to use a specific parallelization that is different from the default one, or if you have an existing workflow and you want to incorporate some functionalities of Sopa into it.
This sequential running is not shown in the CLI tutorial (even though there is a note here) because, as you said, it is highly inefficient. This is why I'm not recommending it, and it shouldn't be used in real-world scenarios. Still, if you want to run segmentation sequentially, you can directly call sopa segmentation cellpose
, but you need to provide the --patch-width
and --patch-overlap
arguments, as detailed in the documentation here, under the "Usage" panel. Note that you don't need to run sopa patchify
in that case, because it will make it for you.
Is there a reason why you are not using the Snakemake pipeline? It will do all the parallelization for you and deal seamlessly with the temporary boundary files. More specifically, it will run the patches in parallel using different processes. For instance, I'm using Slurm as a cluster, and it takes about one or two hours for this large image size. I would be happy to help you use the pipeline if you have any issues with it 🙂
Concerning GPU usage, this is indeed something I can work on. How many GPUs do you have? I'm currently not using GPUs because usually we have access to much more CPUs, and it is faster to run 50 patches in parallel using CPUs than running a few patches in parallel with GPUs. Still, thanks for the feedback, I'll add the option in a future release!
Hello @quentinblampey,
Thank you for your detailed answer. I've run the snakemake pipeline but it failed after 18h+. The job ran on 16 cores and the bottleneck is the cellpose segmentation. I have a nvidia Tesla T4 GPU. Enabling GPU would really speed things up.
Not sure why it crashed. The log doesn't say much more. Thank you :)
Hello,
I'm surprised it didn't finish after 18+ hours. Have you tried using a Snakemake profile or did you instead use --cores 16
in the snakemake command?
For example, there is a Slurm profile in the doc, but I'm not sure if you have a slurm cluster (else, what do you use)?
Concerning the error, it should probably be somewhere in your logs, but since you use 16 cores it may be much higher in the logs. Can you check the full log file?
I have added the GPU support, but it is not released yet, and I still need to perform some tests. If you want to try it before the release, you need to go on the dev
branch (and sopa
has to be installed in editable mode). The new CLI helper is updated, but the corresponding documentation is not online yet (it will be online at the date of the release).
To try it yourself, update your config with the method_kwargs
as below:
segmentation:
cellpose:
diameter: 60
# ... all other cellpose arguments
method_kwargs:
cellpose_model_kwargs:
gpu: true
Let me know if it works for you!
Last questions: can you show me the config you use? Most importantly, what is the cellpose diameter? It should be around 60 pixels for MERSCOPE data.
Hi Quentin, answers to your questions:
Thanks for providing GPU support. I will try.
Thanks for the info @andrea-de-micheli!
Concerning your log, it seems that the error is indeed not there. But, I see that the error rule had jobid: 59
, do you know if you can find the logs of the job 59?
I had a quick look to the config, it is looking good. Just one question: according to your config, you want to run both Cellpose and Baysor, I just want to make sure this is what you wanna do. Indeed, when asking for both Cellpose + Baysor, it will output the Baysor segmentation based on the Cellpose segmentation as a prior. Since you have MERSCOPE data, you can run Baysor on prior Vizgen segmentation instead of using Cellpose as a prior (this can save some time, while having minor differences in the Baysor final output). In brief, for MERSCOPE data, I recommend either (i) running Cellpose only, or (ii) running Baysor with the Vizgen prior, but of course you can indeed do Baysor + Cellpose prior if you prefer.
The segmentation provided by MERSCOPE is somewhat disappointing, in part because of the cell boundary markers that didn't work well on my sample. This is why I would like to redo the segmentation, and, while looking for alternative segmentation methods, I found Baysor and then your Sopa pipeline that nicely includes both and outputs exactly what I need for my downstream analyses 🙃 I would like to use your pipeline to redo segmentation and use Baysor with Cellpose priors. I hope this is what cell_key: cell_id
does automatically when running Snakemake.
I also work with IMC and other type of multiplex IF datasets, this is why I find Sopa attractive as a more universal tool.
Actually cell_key: cell_id
is when you want to use the default Vizgen segmentation as a prior. Instead, to use cellpose as a prior, simply delete cell_key
and it will use cellpose automatically. For instance this Xenium config is running cellpose and then baysor with cellpose prior.
I will update the documentation because it was probably not clear how to run cellpose + baysor!
Good to hear that you also have IF datasets 🙂 Since IMC is small data, it's probably a good start to work with the pipeline, and I think the default config should work nicely.
Bonsoir Quentin,
I ran again the snakemake pipeline and it crashes after 18+ hours. I think this is the relevant log that caused the crash:
[Tue Mar 26 03:19:02 2024]
Error in rule patch_segmentation_baysor:
jobid: 523
input: /mnt/XXX/andrea/Projects/XXX/MERFISH/merfish_output/XXX/region_0.zarr/.sopa_cache/patches_file_baysor, /mnt/XXX/andrea/Projects/XXX/MERFISH/merfish_output/XXX/region_0.zarr/.sopa_cache/baysor_boundaries/149
output: /mnt/XXX/andrea/Projects/XXX/MERFISH/merfish_output/XXX/region_0.zarr/.sopa_cache/baysor_boundaries/149/segmentation_polygons.json, /mnt/XXX/andrea/Projects/XXX/MERFISH/merfish_output/XXX/region_0.zarr/.sopa_cache/baysor_boundaries/149/segmentation_counts.loom
shell:
if command -v module &> /dev/null; then
module purge
fi
cd /mnt/XXX/andrea/Projects/XXX/MERFISH/merfish_output/XXX/region_0.zarr/.sopa_cache/baysor_boundaries/149
~/.julia/bin/baysor run --save-polygons GeoJSON -c config.toml transcripts.csv :cell_id
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
404 of 532 steps (76%) done
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Other patch_segmentation_baysor
tasks have succeeded. Thank you for your help, much appreciated!
Did not try, but can your snakemake only re-execute the tasks that have failed?
Bonjour @andrea-de-micheli,
Yes, you can re-execute the same snakemake command, it will start again where it failed! Snakemake re-runs other rules only if it detects changes (like code change or input change), but if nothing changes, then it starts the missing steps again.
The log you provided indicates which step failed, but it's actually not the step error log. I don't know where the log file is, because it may be specific to your HPC. For instance, in local, I get the logs directly in the console, while with the slurm config the logs are in the workflow/logs
directory.
If you don't know how to get the log, one simple thing could be to re-run the command that failed outside of Snakemake and see what error you get, i.e:
cd /mnt/XXX/andrea/Projects/XXX/MERFISH/merfish_output/XXX/region_0.zarr/.sopa_cache/baysor_boundaries/149
~/.julia/bin/baysor run --save-polygons GeoJSON -c config.toml transcripts.csv :cell_id
One important thing: I see that there is :cell_id
in the baysor command, which means it tries to use the Vizgen segmentation as a prior instead of the cellpose segmentation. As explained in my comment above, you need to delete cell_key
in your config, because the purpose of this variable is to indicate the key of the Vizgen prior segmentation 😉
Hello @quentinblampey,
Has GPU support been added to sopa 1.0.8?
I'm getting No such option: --method-kwargs
error from sopa segmentation cellpose
when running snakemake.
Thanks!
Hello @andrea-de-micheli, yes it has been added. Can you check that you have the right version on your sopa
conda environment? (I'm precising the sopa
environment because it is the one that Snakemake is using)
In this env, you can run sopa segmentation cellpose --help
and see if you have the --method-kwargs
argument listed.
I do see the --method-kwargs
listed and still get the 'No such option: --method-kwargs' error
preceding the error below. Tried both with sopa 1.0.8 and 1.0.9.
This is the error from cellpose:
sopa segmentation cellpose /mnt/XXX/andrea/Projects/XXX/MERFISH/merfish_output/XXX/region_0.zarr --patch-dir /mnt/XXX/andrea/Projects/XXX/MERFISH/merfish_output/XXX/region_0.zarr/.sopa_cache/cellpose_boundaries \
--patch-index 37 --diameter 60 --channels 'DAPI' --flow-threshold 2 --cellprob-threshold -6 --model-type 'cyto3' --min-area 1000 --clip-limit 0.2 --gaussian-sigma 1 --method-kwargs "{'cellpose_model_kwargs': {'gpu': True}}"
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Hello @andrea-de-micheli, I tried executing your command and it reads the argument as expected. Can you try running the CLI command that you pasted without using snakemake? Are you sure you have only sopa environment?
Hi @andrea-de-micheli, have you tried my comment above? Let me know if I can close the issue
Hi @quentinblampey, I did manage it to work via the snakemake command. We can close the issue for now. Thank you for your support!
Hi,
Just a conceptual suggestion in the CLI tutorial. It would be more intuitive if
sopa segmentation cellpose
can automatically run on all the patches generated bysopa patchify image
without having to specify either a--patch-index
or again--patch-width --patch-overlap
.For example, simply running:
sopa segmentation cellpose tuto.zarr --channels DAPI --diameter 35
will return an error.Also, any suggestion on how to parallelize the segmentation process and leverage CUDA? Thank you very much!
EDIT: asking because I'm working with a very large image. I have 560 patches that are 5000x5000 pixels and segmentation is projected to ~30h.