gustaveroussy / sopa

Technology-invariant pipeline for spatial omics analysis (Xenium / MERSCOPE / CosMx / PhenoCycler / MACSima / Hyperion) that scales to millions of cells
https://gustaveroussy.github.io/sopa/
BSD 3-Clause "New" or "Revised" License
109 stars 11 forks source link

[Feature] Quantile projection for cyto3 segmentation #98

Closed josenimo closed 1 month ago

josenimo commented 1 month ago

Is your feature request related to a problem? Please describe. Not really a problem, and I would like your opinion, maybe I am overthinking this. I would like to perform a projection of membrane markers (eg. CD8, CD20, PanCK, NaKATPase) and create a two channel image before segmenting with cellpose models.

Describe the solution you'd like A rule in the snakemake, with its respective python script, that would take an image in, and output is a (c:2, y,x) image (saved in sdata.zarr).

Describe alternatives you've considered

  1. Performing this outside of Sopa, just very RAM intensive for whole slide images, and I thought that the patching makes a lot of sense, especially since projections (max, median, etc.) are heavily impacted by small artefacts somewhere in the image, performing projections on 3000-5000px patches would be the best.

Additional context I don't want to take more of your time, but some guidance on how to code this would be super helpful, I gladly will share it and perhaps offer it as a feature for the next sopa iteration. I am just struggling a little bit with xarray.

quentinblampey commented 1 month ago

I try to keep Sopa as general as possible, so I think this is something that you can add yourself to have your own "customized" Sopa. In particular, the CLI and API options should be easier to customize, but you can also add a new snakemake rule!

To do that, I advise to use map_blocks from Dask to create a lazy image projection. Then, you can add this new channel to the original image, and provide a channel name for it, e.g. "projection". The rest of the pipeline will not change, and for instance you'll be able to run cellpose with the channels ["DAPI", "projection"]

For testing/debugging purposes, I recommend using the toy dataset, as in this tutorial, and then apply it on your full data

josenimo commented 1 month ago

Thank you! I manage to let work in lots of RAM in HPC with numpy, later I will optimize to Dask, for now working well :)