kstreet13 / slingshot

Functions for identifying and characterizing continuous developmental trajectories in single-cell data.
259 stars 42 forks source link

Error with shortest_paths weight vector containing NaN values #186

Closed No2Ross closed 2 years ago

No2Ross commented 2 years ago

I'm applying slingshot to a reduced dimension plot created by PHATE and run into the following error.

The problem persists after remaking the seurat object.

Error in shortest_paths(tree, from = cur.root, to = setdiff(deg1, cur.root)) : At core/paths/dijkstra.c:360 : Weight vector must not contain NaN values, Invalid value

Never encountered this error before when applying slingshot to PHATE embeddings and it only occurs for this specific dataset. Can supply a data frame of the embeddings and the cluster IDs if needed.

Running R version 4.1.2 igraph version 1.3.1 Slingshot version 2.2.1

Also the full versions of my attatched packages just in case

[1] igraph_1.3.1 gplots_3.1.3 scales_1.2.0 mclust_5.4.9 stringr_1.4.0
[6] reshape2_1.4.4 stringi_1.7.6 slingshot_2.2.1 TrajectoryUtils_1.2.0 princurve_2.1.6
[11] RColorBrewer_1.1-3 clustree_0.4.4 ggraph_2.0.5 scran_1.22.1 scuttle_1.4.0
[16] SingleCellExperiment_1.16.0 SummarizedExperiment_1.24.0 Biobase_2.54.0 GenomicRanges_1.46.1 GenomeInfoDb_1.30.1
[21] IRanges_2.28.0 S4Vectors_0.32.4 BiocGenerics_0.40.0 MatrixGenerics_1.6.0 matrixStats_0.62.0
[26] dtw_1.22-3 proxy_0.4-26 cowplot_1.1.1 ggpubr_0.4.0 rgl_0.108.10
[31] sp_1.4-7 SeuratObject_4.1.0 Seurat_4.1.1 dplyr_1.0.9 phateR_1.0.7
[36] Matrix_1.4-1 lattice_0.20-45 ggplot2_3.3.6 gridExtra_2.3 reticulate_1.25

kstreet13 commented 2 years ago

Hi @No2Ross,

Thanks for raising this issue! I'm guessing you saw the earlier issue where remaking the Seurat object was sufficient to make it go away, but I'm very curious about what's actually causing this.

And yes, If you don't mind sharing (either here or via email), I would greatly appreciate the data frame you mentioned so that I can try to reproduce the error message.

Thanks! Kelly

No2Ross commented 2 years ago

slingshot_error.csv

Hi Kelly. Attatched the dataframe with the 8 dimension PHATE embeddings as well as the cluster IDs. I used cluster 5 as the starting point. Thanks for getting back to me! Ross

kstreet13 commented 2 years ago

Hi Ross,

Thanks very much! I didn't trace it all the way through to the C code, but I think I was able to determine the cause of the error and it looks like a small cluster issue.

There are two very small clusters (1 and 2 cells, respectively):

Screen Shot 2022-05-20 at 5 04 44 PM

And that is likely messing up the default distance metric (which relies on being able to compute a covariance matrix for each cluster).

If you don't want to remove these cells, you can try a different distance metric. I generally recommend "mnn" as a more robust choice (although it still gives a few warnings). Unfortunately in this case, it also creates some spurious lineages, as both cluster 0 and cluster 1 are picked up as endpoints and I'm guessing that's not what you wanted. Setting the method to "simple" (ie. Euclidean distance between cluster centers) results in two lineages and looks pretty reasonable.

Thanks again for sharing the data and let me know if you have any other questions! Kelly

No2Ross commented 2 years ago

Hi Kelly,

Feel dumb now that it was the only problem with the code. Got rid of those cells from the object and it worked. Thanks for your help!

Ross

kstreet13 commented 2 years ago

No problem! Thanks again for helping me diagnose that error message.

pagarwal14 commented 11 months ago

Hi Kelly, I am getting an error similar to the one in this post running the code in the vignette. I was hoping the vignette code would run without any issues since it has probably been tested. So I am wondering what could be the cause. The error is:

sce <- slingshot(sce, clusterLabels = 'GMM', reducedDim = 'PCA') Error in igraph::shortest_paths(tree, from = l, to = ends, mode = "out", : At core/paths/dijkstra.c:364 : Weight vector must not contain NaN values, Invalid value In addition: Warning messages: 1: useNames = NA is deprecated. Instead, specify either useNames = TRUE or useNames = TRUE. 2: useNames = NA is deprecated. Instead, specify either useNames = TRUE or useNames = TRUE. 3: useNames = NA is deprecated. Instead, specify either useNames = TRUE or useNames = TRUE. 4: useNames = NA is deprecated. Instead, specify either useNames = TRUE or useNames = TRUE. 5: useNames = NA is deprecated. Instead, specify either useNames = TRUE or useNames = TRUE. 6: useNames = NA is deprecated. Instead, specify either useNames = TRUE or useNames = TRUE. 7: useNames = NA is deprecated. Instead, specify either useNames = TRUE or useNames = TRUE.

sessionInfo() R version 4.2.2 (2022-10-31 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 22621)

Matrix products: default

Random number generation: RNG: Mersenne-Twister Normal: Inversion Sample: Rounding

locale: [1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United States.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] RColorBrewer_1.1-3 uwot_0.1.14 Matrix_1.5-4.1 mclust_6.0.0 slingshot_2.6.0 TrajectoryUtils_1.6.0
[7] SingleCellExperiment_1.20.1 SummarizedExperiment_1.28.0 Biobase_2.58.0 GenomicRanges_1.50.2 GenomeInfoDb_1.34.9 IRanges_2.32.0
[13] S4Vectors_0.36.2 BiocGenerics_0.44.0 MatrixGenerics_1.10.0 matrixStats_1.0.0 princurve_2.1.6

loaded via a namespace (and not attached): [1] igraph_1.5.0 Rcpp_1.0.10 rstudioapi_0.14 XVector_0.38.0 magrittr_2.0.3 zlibbioc_1.44.0
[7] lattice_0.21-8 FNN_1.1.3.2 rlang_1.1.1 sparseMatrixStats_1.10.0 DelayedMatrixStats_1.20.0 tools_4.2.2
[13] grid_4.2.2 irlba_2.3.5.1 cli_3.6.1 GenomeInfoDbData_1.2.9 bitops_1.0-7 RCurl_1.98-1.12
[19] DelayedArray_0.24.0 compiler_4.2.2 pkgconfig_2.0.3

Thanks, Pankaj

kstreet13 commented 11 months ago

@pagarwal14, have you checked to see if your error is also being caused by a small cluster?

pagarwal14 commented 11 months ago

There is a cluster with one cell

table(cl1) cl1 1 2 3 4 5 6 7 78 1 38 58 24 34 67

So I removed cluster 2 as follows:

clusters1 = cl1[cl1 %in% c(1,3,4,5,6,7)]

str(clusters1) Named num [1:299] 1 1 1 1 1 1 1 1 1 1 ...

  • attr(*, "names")= chr [1:299] "c1" "c2" "c3" "c4" ...

unique(cl1) [1] 1 3 2 4 5 6 7

unique(clusters1) [1] 1 3 4 5 6 7

colData(sce)$GMM <- clusters1 Error in [[<-(*tmp*, name, value = c(c1 = 1, c2 = 1, c3 = 1, c4 = 1, : 299 elements in value to replace 300 elements

Could you please suggest how to remove the cluster from the sce object?

Thanks Pankaj

From: Kelly Street @.> Sent: Tuesday, August 22, 2023 12:38 PM To: kstreet13/slingshot @.> Cc: Pankaj Agarwal @.>; Mention @.> Subject: Re: [kstreet13/slingshot] Error with shortest_paths weight vector containing NaN values (Issue #186)

@pagarwal14https://urldefense.com/v3/__https:/github.com/pagarwal14__;!!OToaGQ!vmKT2NiBWyhp2lpc-INgzkgr2mRztztxIrXV_rYltOhfe1IbN5j3eip9n0E6YD_ypCI_KccWbYCeD5EYJaxglhEKoA$, have you checked to see if your error is also being caused by a small cluster?

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/kstreet13/slingshot/issues/186*issuecomment-1688557224__;Iw!!OToaGQ!vmKT2NiBWyhp2lpc-INgzkgr2mRztztxIrXV_rYltOhfe1IbN5j3eip9n0E6YD_ypCI_KccWbYCeD5EYJax9ZJ9ALw$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AAYTY3S3LTA5MH324NOTFZ3XWTN6FANCNFSM5WL4WGYA__;!!OToaGQ!vmKT2NiBWyhp2lpc-INgzkgr2mRztztxIrXV_rYltOhfe1IbN5j3eip9n0E6YD_ypCI_KccWbYCeD5EYJayiaEk6BA$. You are receiving this because you were mentioned.Message ID: @.**@.>>

kstreet13 commented 11 months ago

You can subset an SCE object the same way you would a matrix and it will keep all of the associated metadata. If you want to learn more, I would recommend the OSCA book: https://bioconductor.org/books/3.13/OSCA.intro/the-singlecellexperiment-class.html