Open josenimo opened 1 month ago
Hello,
Have you updated utils.py
? I think that adding a wildcard in self.sdata_path
here should do the trick, as everything else depends on this variable. But I'm not sure if you'll have a ChildIOException...
Let me know how it goes!
I have not updated utils.py, will do that :)
You mean hacking the utils.py
or just listing all the data_paths in snakefile and running it through that (this seems the simplest)
To be explicit:
SAMPLES = ["/data/input/1.ome.tif", "/data/input/2.ome.tif", "/data/input/3.ome.tif", "/data/input/4.ome.tif"]
rule prepare_image:
input:
raw_image= expand("{sample}", sample=SAMPLES)
I meant something like self.sdata_path = Path("data/zarrs/{sample}.zarr")
in utils.py:L43
, but I'm not sure it would work
Hey @quentinblampey,
So I have tried a couple of things. First I tried your suggestion, or trying to pass a {sample} wildcard to the paths creator. Snakemake just creates one set of paths, and thus fails, I am not sure if snakemake is compatible with this kind of object creation setup. I also tried replacing the paths object, with a .yaml file that could be read by the functions, I managed to make this work for my homebrewed functions, but did not dare to modify the sopa functions to accept a yaml file instead of a path directly. Not sure where to go from here. At the moment I am running the analysis one image at a time. The only idea I have left is to perhaps run the sopa python functions directly in the snakefile, haven't given it much thought though.
Happy to hear your comments.
Hey, sorry to hear it's still not working. I have to say I still find Snakemake confusing sometimes, so I don't really know how to make this work without many trials and errors.
One question: why do you want to run everything at once? I personally prefer to run Sopa on one slide at a time; this way, if one slide fails, it will not make the other images fail.
Also, I personally use InquirerPy to prompt and execute the pipeline, which makes it very easy to run multiple slides. See this example here, which asks to choose a config and an existing data directory. With a similar idea, you could also use a multi-select to run multiple snakemake commands at once. I also made this prompt available for other users of my institute, so it's pretty easy for them to run Sopa.
Dear @quentinblampey
I am in the process of changing the provided Snakefile to run many samples (images) at once. I am replacing the inputs and outputs paths to include the {sample} wildcard. I however run into an issue with
get_input_resolve
since it handles all the patches itself. Do you have any suggestions here? Should I create my own get_input_resolve to include the {sample} wildcard?I tried replacing the paths to relative paths with wildcards, but the ChildIOException starts appearing.. Any thoughts are welcome, current snakefile below:
Snakefile
```python configfile: "ome_tif.yaml" from utils import WorkflowPaths, Args paths = WorkflowPaths(config) args = Args(paths, config) SAMPLES = ["1", "2", "3", "4"] localrules: all rule all: input: expand("data/quantification/{sample}_quant.csv", sample=SAMPLES) shell: """ echo 🎉 Successfully run sopa echo → SpatialData output directory: {paths.sdata_path} echo → Explorer output directory: {paths.explorer_directory} echo → Open the result in the explorer: 'open {paths.explorer_experiment}' """ rule prepare_image: input: raw_image= expand("data/input/{sample}.ome.tif", sample=SAMPLES) # raw_image=paths.data_path if config["read"]["technology"] != "uniform" else [] output: projection= "data/projections/{sample}_projection.ome.tif" # projection="data/projections/small_image_projection.ome.tif" conda: "sopa" params: nuclei_channels=config["projection"]["nuclei_channels"], nuclear_scaling_min_max_quantiles=config["projection"]["nuclear_scaling_min_max_quantiles"], nuclear_projection_quantile=config["projection"]["nuclear_projection_quantile"], membrane_channels=config["projection"]["membrane_channels"], membrane_scaling_min_max_quantiles=config["projection"]["membrane_scaling_min_max_quantiles"], membrane_projection_quantile=config["projection"]["membrane_projection_quantile"] shell: """ python scripts/project.py \ --input {input.raw_image} \ --output {output.projection} \ --channels_nucleus {params.nuclei_channels} \ --nucleus_min_max_scaling_quantiles {params.nuclear_scaling_min_max_quantiles} \ --nuclear_quantile {params.nuclear_projection_quantile} \ --channels_membrane {params.membrane_channels} \ --membrane_min_max_scaling_quantiles {params.membrane_scaling_min_max_quantiles} \ --membrane_quantile {params.membrane_projection_quantile} """ rule to_spatialdata: input: projected_image="data/projections/{sample}_projection.ome.tif" output: zarr_object = "data/zarrs/{sample}.zarr" conda: "sopa" resources: mem_mb=128_000, params: args_reader = str(args['read']) shell: """ sopa read {input.projected_image} --sdata-path {output.zarr_object} {params.args_reader} """ checkpoint patchify_cellpose: input: zarr_object = "data/zarrs/{sample}.zarr" output: patches_file = "data/zarrs/{sample}.zarr/.sopa_cache/patches_file_image", patches = touch("data/zarrs/{sample}.zarr/.sopa_cache/patches"), params: args_patchify = str(args["patchify"].where(contains="pixel")), conda: "sopa" shell: """ sopa patchify image {input.zarr_object} {params.args_patchify} """ rule patch_segmentation_cellpose: input: zarr_object = "data/zarrs/{sample}.zarr", patches_file = "data/zarrs/{sample}.zarr/.sopa_cache/patches_file_image", patches = "data/zarrs/{sample}.zarr/.sopa_cache/patches" output: cellpose_boundaries = "data/zarrs/{sample}.zarr/.sopa_cache/cellpose_boundaries/", #not sure if this is correct, but to signal the order of events cellpose_tmp_dir = "data/zarrs/{sample}.zarr/.sopa_cache/cellpose_boundaries/{wildcards.index}.parquet" conda: "sopa" params: args_cellpose = str(args["segmentation"]["cellpose"]), shell: """ sopa segmentation cellpose {input.zarr_object} --patch-dir {output.cellpose_tmp_dir} --patch-index {wildcards.index} {params.args_cellpose} """ def get_input_resolve(name, dirs=False): def _(wilcards): with getattr(checkpoints, f"patchify_{name}").get(**wilcards).output.patches_file.open() as f: return paths.cells_paths(f.read(), name, dirs=dirs) return _ rule resolve_cellpose: input: zarr_object = "data/zarrs/{sample}.zarr", cellpose_tmp_dir = "data/zarrs/{sample}.zarr/.sopa_cache/cellpose_boundaries/" output: touch("data/zarrs/{sample}.zarr/.sopa_cache/cellpose_boundaries_done"), conda: "sopa" shell: """ sopa resolve cellpose {input.zarr_object} --patch-dir {input.cellpose_tmp_dir} """ rule rasterize: input: zarr_object = "data/zarrs/{sample}.zarr", cellpose_boundaries = "data/zarrs/{sample}.zarr/.sopa_cache/cellpose_boundaries_done" #to ensure to run after segmentation is done output: mask_tif="data/masks/{sample}_mask.tif" conda: "sopa" shell: """ mkdir -p ./data/masks python scripts/rasterize.py \ --input {input.zarr_object} \ --output {output.mask_tif} """ rule quantify: input: raw_image= expand("data/input/{sample}.ome.tif", sample=SAMPLES), mask_tif="data/masks/{sample}_mask.tif", markers = "data/input/markers.csv" output: quantification = "data/quantification/{sample}_quant.csv" conda: "sopa" params: math = config["quantify"]["math"], quantile = config["quantify"]["quantile"], shell: """ mkdir -p ./data/quantification python scripts/quant.py \ --image {input.raw_image} \ --label {input.mask_tif} \ --markers {input.markers} \ --output {output.quantification} \ --math {params.math} \ --quantile {params.quantile} """ ```