kharchenkolab / Baysor

Bayesian Segmentation of Spatial Transcriptomics Data
MIT License
142 stars 29 forks source link

Error with docker version v0.6.2 #90

Open andynkili opened 11 months ago

andynkili commented 11 months ago

Dear developpers,

Thank you for Baysor! I am trying to use it on our merfish data with your merfish.toml config file (github). I use the latest docker version (the one pushed 2 days ago), however it fails when estimating the NCV: `

baysor run detected_transcripts.csv -c output/default_merfish_config_rerun/merfish.toml -p -o output/default_merfish_config_rerun

[20:20:32] Info: Run R8d050d17b [20:20:32] Info: (2023-08-16) Run Baysor v0.6.2 [20:20:32] Info: Loading data... [20:22:18] Info: Loaded 11448082 transcripts [20:22:24] Info: Estimating noise level [20:25:46] Info: Done [20:27:45] Info: Clustering molecules... [20:29:27] Warning: ICA did not converge, fall back to random initialization └ Baysor.Processing /root/.julia/packages/Baysor/XOSVt/src/processing/bmm_algorithm/molecule_clustering.jl:147 Progress: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 1:24:47 Iteration: 1637 Max. difference: 0.00395 Fraction of probs changed: 0.0181 [21:54:32] Info: Algorithm stopped after 1637 iterations. Error: 0.00395. Converged: true. [21:54:33] Info: Done [21:54:35] Info: Initializing algorithm. Scale: 6.15, scale std: 1.5375, initial #components: 763204, #molecules: 11448082. [21:55:30] Info: Using the following additional information about molecules: [:confidence, :cluster] [21:55:30] Info: Using 2D coordinates Progress: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:41:59 Iteration: 500 Noise level, %: 90.43 Num. components: 3373 [22:39:46] Info: Processing complete. [22:40:00] Info: Estimating local colors [22:51:18] Warning: n=10000, which is > length(high_conf_ids) (0) └ Baysor.Processing /root/.julia/packages/Baysor/XOSVt/src/processing/data_processing/initialization.jl:180 ERROR: LoadError: ArgumentError: size(X, 2) must be greater than n_neighbors and nneighbors must be greater than 0 Stacktrace: [1] UMAP.UMAP(X::Matrix{Float32}, n_components::Int64; n_neighbors::Int64, metric::Distances.Euclidean, n_epochs::Int64, learning_rate::Int64, init::Symbol, min_dist::Float64, spread::Float64, set_operation_ratio::Int64, local_connectivity::Int64, repulsion_strength::Int64, neg_samplerate::Int64, a::Nothing, b::Nothing) @ UMAP ~/.julia/packages/UMAP/oqkOM/src/umap.jl:94 [2] UMAP @ ~/.julia/packages/UMAP/oqkOM/src/umap.jl:77 [inlined] [3] #umap#7 @ ~/.julia/packages/UMAP/oqkOM/src/umap.jl:45 [inlined] [4] umap @ ~/.julia/packages/UMAP/oqkOM/src/umap.jl:43 [inlined] [5] fit(::Type{Baysor.Processing.UmapFit}, x::Matrix{Float32}; n_components::Int64, nn_interpolate::Int64, spread::Float64, min_dist::Float64, metric::Distances.Euclidean, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}) @ Baysor.Processing ~/.julia/packages/Baysor/XOSVt/src/processing/data_processing/umap_wrappers.jl:22 [6] fit @ ~/.julia/packages/Baysor/XOSVt/src/processing/data_processing/umap_wrappers.jl:17 [inlined] [7] gene_composition_color_embedding(pca::Matrix{Float32}, confidence::Vector{Float64}; normalize::Bool, sample_size::Int64, seed::Int64, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}) @ Baysor.Processing ~/.julia/packages/Baysor/XOSVt/src/processing/data_processing/neighborhood_composition.jl:128 [8] gene_composition_color_embedding(pca::Matrix{Float32}, confidence::Vector{Float64}) @ Baysor.Processing ~/.julia/packages/Baysor/XOSVt/src/processing/data_processing/neighborhood_composition.jl:116 [9] gene_composition_colors(df_spatial::DataFrames.DataFrame, k::Int64; method::Symbol, n_pcs::Int64, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}) @ Baysor.Processing ~/.julia/packages/Baysor/XOSVt/src/processing/data_processing/neighborhood_composition.jl:165 [10] gene_composition_colors @ ~/.julia/packages/Baysor/XOSVt/src/processing/data_processing/neighborhood_composition.jl:160 [inlined] [11] run_segmentation(df_spatial::DataFrames.DataFrame, gene_names::Vector{String}, opts::Baysor.Utils.SegmentationOptions; plot_opts::Baysor.Utils.PlottingOptions, min_molecules_per_cell::Int64, estimate_ncvs::Bool, plot::Bool, save_polygons::Bool, run_id::String) @ Baysor.Processing ~/.julia/packages/Baysor/XOSVt/src/processing/utils/cli_wrappers.jl:72 [12] run(coordinates::String, prior_segmentation::String; config::Baysor.Utils.RunOptions, x_column::String, y_column::String, z_column::String, gene_column::String, min_molecules_per_cell::Int64, scale::Float64, scale_std::String, n_clusters::Int64, prior_segmentation_confidence::Float64, output::String, plot::Bool, save_polygons::String, no_ncv_estimation::Bool, count_matrix_format::String) @ Baysor.CommandLine ~/.julia/packages/Baysor/XOSVt/src/cli/main.jl:123 [13] run @ ~/.julia/packages/Baysor/XOSVt/src/cli/main.jl:51 [inlined] [14] command_main(ARGS::Vector{String}) @ Baysor.CommandLine ~/.julia/packages/Comonicon/HDhA6/src/codegen/julia.jl:343 [15] command_main() @ Baysor.CommandLine ~/.julia/packages/Comonicon/HDhA6/src/codegen/julia.jl:90 [16] command_main(; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}) @ Baysor ~/.julia/packages/Baysor/XOSVt/src/Baysor.jl:41 [17] command_main() @ Baysor ~/.julia/packages/Baysor/XOSVt/src/Baysor.jl:41 [18] top-level scope @ /usr/local/bin/baysor:15 in expression starting at /usr/local/bin/baysor:15`

The error 'ArgumentError: size(X, 2) must be greater than n_neighbors and n_neighbors must be greater than 0' but in your merfish.toml config file, the gene_composition_neigborhood argument is set to 70. May you share your insight on what's is causing that error?

Also I had the same error with Baysor v0.6.0 (using the same command with the same config file on the same data), so I used the --no-ncv-estimation: # baysor run detected_transcripts.csv -c output/default_merfish_config/merfish.toml -p --no-ncv-estimation -o output/default_merfish_config [12:44:49] Info: Run R35e71b6bf [12:44:49] Info: (2023-08-16) Run Baysor v0.6.0 [12:44:49] Info: Loading data... [12:46:30] Info: Loaded 11448082 transcripts [12:46:34] Info: Estimating noise level [12:48:48] Info: Done [12:51:01] Info: Clustering molecules... [12:52:42] Warning: ICA did not converge, fall back to random initialization └ Baysor.Processing /root/.julia/packages/Baysor/DGy47/src/processing/bmm_algorithm/molecule_clustering.jl:147 Progress: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 1:03:59 Iteration: 1274 Max. difference: 0.00835 Fraction of probs changed: 0.046 [13:56:57] Info: Algorithm stopped after 1274 iterations. Error: 0.00835. Converged: true. [13:57:04] Info: Done [13:57:04] Info: Initializing algorithm. Scale: 6.15, scale std: 1.5375, initial #components: 763204, #molecules: 11448082. [13:57:59] Info: Using the following additional information about molecules: [:confidence, :cluster] [13:57:59] Info: Using 2D coordinates Progress: 7%|█████████████▏ | ETA: 0:53:39 Iteration: 33 Noise level, %: 90.74 Progress: 12%|███████████████████████▍ Progress: 12%|███████████████████████▊ Progress: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:49:08 Iteration: 500 Noise level, %: 90.41 Num. components: 3417 [14:47:47] Info: Processing complete. [14:47:57] Info: Saving results to output/default_merfish_config/segmentation.csv Progress: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:01 [14:49:39] Info: Plotting results ERROR: LoadError: MethodError: no method matching length(::Nothing) Closest candidates are: length(::Union{Base.KeySet, Base.ValueIterator}) at abstractdict.jl:58 length(::Union{Tables.AbstractColumns, Tables.AbstractRow}) at ~/.julia/packages/Tables/AcRIE/src/Tables.jl:180 length(::Union{LinearAlgebra.Adjoint{T, <:Union{StaticArraysCore.StaticArray{Tuple{var"#s2"}, T, 1} where var"#s2", StaticArraysCore.StaticArray{Tuple{var"#s3", var"#s4"}, T, 2} where {var"#s3", var"#s4"}}}, LinearAlgebra.Diagonal{T, <:StaticArraysCore.StaticArray{Tuple{var"#s13"}, T, 1} where var"#s13"}, LinearAlgebra.Hermitian{T, <:StaticArraysCore.StaticArray{Tuple{var"#s10", var"#s11"}, T, 2} where {var"#s10", var"#s11"}}, LinearAlgebra.LowerTriangular{T, <:StaticArraysCore.StaticArray{Tuple{var"#s18", var"#s19"}, T, 2} where {var"#s18", var"#s19"}}, LinearAlgebra.Symmetric{T, <:StaticArraysCore.StaticArray{Tuple{var"#s7", var"#s8"}, T, 2} where {var"#s7", var"#s8"}}, LinearAlgebra.Transpose{T, <:Union{StaticArraysCore.StaticArray{Tuple{var"#s2"}, T, 1} where var"#s2", StaticArraysCore.StaticArray{Tuple{var"#s3", var"#s4"}, T, 2} where {var"#s3", var"#s4"}}}, LinearAlgebra.UnitLowerTriangular{T, <:StaticArraysCore.StaticArray{Tuple{var"#s24", var"#s25"}, T, 2} where {var"#s24", var"#s25"}}, LinearAlgebra.UnitUpperTriangular{T, <:StaticArraysCore.StaticArray{Tuple{var"#s21", var"#s22"}, T, 2} where {var"#s21", var"#s22"}}, LinearAlgebra.UpperTriangular{T, <:StaticArraysCore.StaticArray{Tuple{var"#s15", var"#s16"}, T, 2} where {var"#s15", var"#s16"}}, StaticArraysCore.StaticArray{Tuple{var"#s25"}, T, 1} where var"#s25", StaticArraysCore.StaticArray{Tuple{var"#s1", var"#s3"}, T, 2} where {var"#s1", var"#s3"}, StaticArraysCore.StaticArray{<:Tuple, T}} where T) at ~/.julia/packages/StaticArrays/4uslg/src/abstractarray.jl:1 ... Stacktrace: [1] _similar_shape(itr::Nothing, #unused#::Base.HasLength) @ Base ./array.jl:663 [2] _collect(cont::UnitRange{Int64}, itr::Nothing, #unused#::Base.HasEltype, isz::Base.HasLength) @ Base ./array.jl:718 [3] collect(itr::Nothing) @ Base ./array.jl:712 [4] run(coordinates::String, prior_segmentation::String; config::Baysor.Utils.RunOptions, x_column::String, y_column::String, z_column::String, gene_column::String, min_molecules_per_cell::Int64, scale::Float64, scale_std::String, n_clusters::Int64, prior_segmentation_confidence::Float64, output::String, plot::Bool, save_polygons::String, no_ncv_estimation::Bool, count_matrix_format::String) @ Baysor.CommandLine ~/.julia/packages/Baysor/DGy47/src/cli/main.jl:141 [5] command_main(ARGS::Vector{String}) @ Baysor.CommandLine ~/.julia/packages/Comonicon/rMXvw/src/codegen/julia.jl:343 [6] command_main() @ Baysor.CommandLine ~/.julia/packages/Comonicon/rMXvw/src/codegen/julia.jl:90 [7] command_main(; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}) @ Baysor ~/.julia/packages/Baysor/DGy47/src/Baysor.jl:41 [8] command_main() @ Baysor ~/.julia/packages/Baysor/DGy47/src/Baysor.jl:41 [9] top-level scope @ ~/.julia/bin/baysor:15 In the latter, the segmentation was complete but no plot was done. Is there a way to still plot the cell segmentation using only the segementation.csv?

How could I get a complete Baysor (last version) run with all output (including plots)?

Best, Andy