kharchenkolab / Baysor

Bayesian Segmentation of Spatial Transcriptomics Data
https://kharchenkolab.github.io/Baysor/
MIT License
152 stars 31 forks source link

Error: Not enough prior cells pass the min_mols_per_cell=10 threshold #116

Closed quentinblampey closed 3 weeks ago

quentinblampey commented 6 months ago

Hello @VPetukhov,

Thanks again for Baysor, it is a really efficient tool. I have an issue in the rare case when I don't have enough transcript count per prior cell (as mentioned in this issue). The error is the following:

>>> baysor run --save-polygons GeoJSON -c config.toml --scale 6 transcripts.csv :cell

[14:57:28] Info: Run Rb9342f718
[14:57:28] Info: (2024-03-29) Run Baysor v0.6.2
[14:57:28] Info: Loading data...
[14:57:31] Info: Excluding genes: Blank-1, Blank-10, Blank-11, Blank-14, Blank-15, Blank-16, Blank-19, Blank-2, Blank-20, Blank-21, Blank-22, Blank-23, Blank-26, Blank-27, Blank-28, Blank-29, Blank-3, Blank-32, Blank-33, Blank-34, Blank-37, Blank-39, Blank-41, Blank-42, Blank-44, Blank-45, Blank-48, Blank-49, Blank-5, Blank-6, Blank-7, Blank-8
[14:57:32] Info: Loaded 1214 transcripts
ERROR: Not enough prior cells pass the min_mols_per_cell=10 threshold. Please, specify scale manually.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] estimate_scale_from_assignment(pos_data::Matrix{Float64}, assignment::Vector{Int64}; min_mols_per_cell::Int64)
    @ Baysor.DataLoading /home/viktor_petukhov/.julia/dev/Baysor/src/data_loading/prior_segmentation.jl:92
  [3] estimate_scale_from_assignment
    @ /home/viktor_petukhov/.julia/dev/Baysor/src/data_loading/prior_segmentation.jl:89 [inlined]
  [4] parse_prior_assignment(pos_data::Matrix{Float64}, prior_segmentation::Vector{Int64}; col_name::Symbol, min_molecules_per_segment::Int64, min_mols_per_cell::Int64)
    @ Baysor.DataLoading /home/viktor_petukhov/.julia/dev/Baysor/src/data_loading/cli_wrappers.jl:21
  [5] load_prior_segmentation!(path::String, df_spatial::DataFrames.DataFrame, pos_data::Matrix{Float64}; min_molecules_per_segment::Int64, min_mols_per_cell::Int64)
    @ Baysor.DataLoading /home/viktor_petukhov/.julia/dev/Baysor/src/data_loading/cli_wrappers.jl:212
  [6] load_prior_segmentation!(df_spatial::DataFrames.DataFrame, prior_segmentation::String, opts::Baysor.Utils.SegmentationOptions; min_molecules_per_cell::Int64, min_molecules_per_segment::Int64, plot::Bool)
    @ Baysor.CommandLine /home/viktor_petukhov/.julia/dev/Baysor/src/cli/main.jl:197
  [7] run(coordinates::String, prior_segmentation::String; config::Baysor.Utils.RunOptions, x_column::String, y_column::String, z_column::String, gene_column::String, min_molecules_per_cell::Int64, scale::Float64, scale_std::String, n_clusters::Int64, prior_segmentation_confidence::Float64, output::String, plot::Bool, save_polygons::String, no_ncv_estimation::Bool, count_matrix_format::String)
    @ Baysor.CommandLine /home/viktor_petukhov/.julia/dev/Baysor/src/cli/main.jl:108
  [8] command_main(ARGS::Vector{String})
    @ Baysor.CommandLine /home/viktor_petukhov/.julia/packages/Comonicon/HDhA6/src/codegen/julia.jl:343
  [9] command_main
    @ /home/viktor_petukhov/.julia/packages/Comonicon/HDhA6/src/codegen/julia.jl:90 [inlined]
 [10] julia_main()
    @ Baysor.CommandLine /home/viktor_petukhov/.julia/packages/Comonicon/HDhA6/src/frontend/cast.jl:481
 [11] julia_main(; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Baysor /home/viktor_petukhov/.julia/dev/Baysor/src/Baysor.jl:42
 [12] julia_main()
    @ Baysor /home/viktor_petukhov/.julia/dev/Baysor/src/Baysor.jl:42
 [13] top-level scope
    @ none:1

The error indicates to "specify scale manually", which I do, but it still fails. I know we shouldn't have so few transcripts, but this can happen in Sopa when one patch is on the edge of the slide. How to prevent this error? Or, at least, if there is not enough transcript, can it return an output with 0 segmentated cells instead of failing?

Full config details
```sh [data] exclude_genes = "Blank*" force_2d = true min_molecules_per_cell = 10 x = "x" y = "y" z = "z" gene = "gene" min_molecules_per_gene = 0 min_molecules_per_segment = 3 confidence_nn_id = 6 [segmentation] scale = 6 scale_std = "35%" prior_segmentation_confidence = 0.75 estimate_scale_from_centers = false n_clusters = 4 iters = 500 n_cells_init = 0 nuclei_genes = "" cyto_genes = "" new_component_weight = 0.2 new_component_fraction = 0.3 ```
VPetukhov commented 2 months ago

That's an interesting corner case! Thank you for the detailed report. Indeed, I see now that estimate_scale_from_centers = false is not processed as expected. I will fix it for the next release.

quentinblampey commented 2 months ago

Thanks @VPetukhov, let me know!

VPetukhov commented 3 weeks ago

This is fixed in v0.7.0. Please, see the changelog for more details.

quentinblampey commented 3 weeks ago

Thank you @VPetukhov, I'll give it a try!