gustaveroussy / sopa

Technology-invariant pipeline for spatial omics analysis (Xenium / Visium HD / MERSCOPE / CosMx / PhenoCycler / MACSima / ...) that scales to millions of cells
https://gustaveroussy.github.io/sopa/
BSD 3-Clause "New" or "Revised" License
129 stars 15 forks source link

[Bug] baysor jobs fail with DomainError #17

Closed pakiessling closed 8 months ago

pakiessling commented 9 months ago

Hi,

I am trying to run the Snakemake pipeline with Baysor. Unfortunately all my Baysor jobs fail like the following:


rule patch_segmentation_baysor:
    input: /hpcwork/rwth1209/data/merfish/processed/53-ES-500-HuHeart-E3279/region_0.zarr/.sopa_cache/patches_file_baysor, /hpcwork/rwth1209/data/merfish/processed/53-ES-500-HuHeart-E3279/region_0.zarr/.sopa_cache/baysor_boundaries/77
    output: /hpcwork/rwth1209/data/merfish/processed/53-ES-500-HuHeart-E3279/region_0.zarr/.sopa_cache/baysor_boundaries/77/segmentation_polygons.json, /hpcwork/rwth1209/data/merfish/processed/53-ES-500-HuHeart-E3279/region_0.zarr/.sopa_cache/baysor_boundaries/77/segmentation_counts.loom
    jobid: 0
    reason: Missing output files: /hpcwork/rwth1209/data/merfish/processed/53-ES-500-HuHeart-E3279/region_0.zarr/.sopa_cache/baysor_boundaries/77/segmentation_counts.loom, /hpcwork/rwth1209/data/merfish/processed/53-ES-500-HuHeart-E3279/region_0.zarr/.sopa_cache/baysor_boundaries/77/segmentation_polygons.json
    wildcards: index=77
    resources: mem_mb=64000, mem_mib=61036, disk_mb=1000, disk_mib=954, tmpdir=/w0/tmp/slurm_mz637064.42865265, partition=c18m, gpu=0, time=03:00:00, runtime=180

        if command -v module &> /dev/null; then
            module purge
        fi

        cd /hpcwork/rwth1209/data/merfish/processed/53-ES-500-HuHeart-E3279/region_0.zarr/.sopa_cache/baysor_boundaries/77
        /rwthfs/rz/cluster/work/rwth1209/software/baysor/baysor-x86_x64-linux-v0.6.2_build/bin/baysor/bin/baysor run --save-polygons GeoJSON -c config.toml transcripts.csv :cell_id

[10:25:26] Info: Run R998e84721
[10:25:26] Info: (2024-01-26) Run Baysor v0.6.2
[10:25:26] Info: Loading data...
[10:25:28] Info: Excluding genes: Blank-1, Blank-10, Blank-11, Blank-12, Blank-13, Blank-14, Blank-15, Blank-16, Blank-17, Blank-18, Blank-19, Blank-2, Blank-20, Blank-21, Blank-22, Blank-23, Blank-24, Blank-25, Blank-26, Blank-27, Blank-28, Blank-29, Blank-3, Blank-30, Blank-31, Blank-32, Blank-33, Blank-34, Blank-35, Blank-36, Blank-37, Blank-38, Blank-39, Blank-4, Blank-40, Blank-41, Blank-42, Blank-43, Blank-44, Blank-45, Blank-46, Blank-47, Blank-48, Blank-49, Blank-5, Blank-50, Blank-6, Blank-7, Blank-8, Blank-9
[10:25:28] Info: Loaded 438614 transcripts
[10:25:33] Info: Estimating noise level
[10:25:46] Info: Done
[10:25:58] Info: Clustering molecules...
[10:26:34] Info: Algorithm stopped after 287 iterations. Error: 0.0057. Converged: true.
[10:26:34] Info: Done
[10:26:34] Info: Initializing algorithm. Scale: -1.0, scale std: -0.25, initial #components: 87722, #molecules: 438614.
[10:26:38] Info: Using the following additional information about molecules: [:confidence, :cluster, :prior_segmentation]
[10:26:38] Info: Using 2D coordinates
ERROR: DomainError with -4.597020976692567:
sqrt will only return a complex result if called with a complex argument. Try sqrt(Complex(x)).

Have you ever seen this before? Is there a specific Baysor version I should use? I am on 0.6.2

I also didnt change these settings:

        n_clusters: 4
        iters: 500
        n_cells_init: 0
        new_component_weight: 0.2
        new_component_fraction: 0.3

Do these need to be optimized per tissue or do they depend on the patchify settings?

I am also not entirely clear on

  patch_width_pixel: 6000
  patch_overlap_pixel: 150
  patch_width_microns: 1000
  patch_overlap_microns: 20

My pixelsize is 0.108 do I need to adjust this?

quentinblampey commented 9 months ago

Hello @pakiessling,

I never saw this error, it is very surprising. I think you can open an issue on the Baysor repository, because the command line executed by snakemake looks very normal. Or maybe just one thing: can you check that the directory /hpcwork/rwth1209/data/merfish/processed/53-ES-500-HuHeart-E3279/region_0.zarr/.sopa_cache/baysor_boundaries/77 looks normal? It should contain a transcripts.csv file, and a config.toml file (can you have a quick look to see if something feels wrong?).

I'm also using the version 0.6.2 of Baysor, and everything works well when running sopa on MERSCOPE data with the same config as you.

Concerning the patch_width_microns and patch_overlap_microns parameters: they are used to create the patches on which baysor will be run. That is, we create patches of 1000 microns (width and height), and the patches will have an overlap of 20 microns; then, we will run Baysor once per patch. You can keep these parameters like this! If you increase patch_width_microns, you'll have less patches, and therefore have less Baysor runs, but each run will require more RAM (so you could optionnally change patch_width_microns to find a better balance depending on your cluster capacity).

I'm sorry to see that you experience all these issues, I hope it will be fixed quickly :)

pakiessling commented 9 months ago

I think I found the reason. Scale and scale std somehow get set to negative values. I think this might be caused by me removing these parameters from the Baysor section of the config. It is a bit confusing as these parameters should not be necessary when suppliyng a prior segmentation as in the case of the Merfish on board segmentation.

My config file looks like this:

# For parameters details, see this commented example: https://github.com/gustaveroussy/sopa/blob/master/workflow/config/example_commented.yaml
read:
  technology: merscope

patchify:
  patch_width_pixel: 6000
  patch_overlap_pixel: 150
  patch_width_microns: 1000
  patch_overlap_microns: 20

segmentation:
  baysor:
    cell_key: cell_id
    unassigned_value: -1

    config:
      data:
        exclude_genes: "Blank*" # genes excluded from the Baysor segmentation
        force_2d: true # if false, uses 3D mode
        min_molecules_per_cell: 10 # min number of transcripts per cell
        x: "x"
        y: "y"
        z: "z"
        gene: "gene"
        min_molecules_per_gene: 0
        min_molecules_per_segment: 3
        confidence_nn_id: 6

      segmentation:
        prior_segmentation_confidence: 0.8 # confidence of the cellpose confidence (float in [0, 1])
        estimate_scale_from_centers: false
        n_clusters: 4
        iters: 500
        n_cells_init: 0
        nuclei_genes: ""
        cyto_genes: ""
        new_component_weight: 0.2
        new_component_fraction: 0.3

aggregate:

annotation:
  method: tangram
  args:
    sc_reference_path: /hpcwork/rwth1209/data/scRNA/reference_datasets/celltypist_heart_no_atrial.h5ad
    cell_type_key: cell_type
    reference_preprocessing: log1p

explorer:
  gene_column: "gene"
  ram_threshold_gb: 16
  pixelsize: 0.108

executables:
  baysor: /rwthfs/rz/cluster/work/rwth1209/software/baysor/baysor-x86_x64-linux-v0.6.2_build/bin/baysor/bin/baysor

I want to segment with Baysor and use the Merscope segmentation as prior. Does anything jump out to you?

Also no reason to apologize, I hope I am not being annoying with the Issues.

quentinblampey commented 9 months ago

Indeed it may be because of removing scale and scale_std. I always keep these parameters, even when I'm using a prior segmentation (actually, I didn't know these parameters were optional when supplying a prior segmentation). Maybe you can try to set back scale: 6.25 and scale_std: "25%" to see if it works?

For the rest of the config, everything looks good to me!

No don't worry, I'm really happy to help, and also very happy to see that some people are already using Sopa 👍