Open denvercal1234GitHub opened 1 year ago
Could you provide a small reproducible example? It is possible that the graph provided to the leiden algorithm unexpectedly contains some non-integer values.
Do you meet the similar warning if you use other clustering algorithm, such as algorithm = 3
?
Thanks, @AustinHartman for your response. The code is below. Do you detect any thing wrong in the steps before FindClusters()
?
You can download the data (12.5MB) at https://drive.google.com/file/d/1zBO1XAPnlt-SCUwWcgFWnOX4fcEA-Ve1/view?usp=share_link. It was a subset of cells from the 20220215_tonsil_atlas_cite_seurat_obj
(from https://zenodo.org/record/6340174#.ZCFjkBXMI0R)
load(".../Massoni_20220215_tonsil_atlas_cite_seurat_obj_CD8Tcells_BCLL15_8_9.RData")
data_ID.list <- SplitObject(Massoni_20220215_tonsil_atlas_cite_seurat_obj_CD8Tcells_BCLL15_8_9, split.by = "data_ID")
### Process RNA data with SCTransform for RNA-based clustering
for (i in 1:length(data_ID.list)) {
DefaultAssay(data_ID.list[[i]]) <- 'RNA'
data_ID.list[[i]] <- NormalizeData(data_ID.list[[i]], assay = 'RNA')
DefaultAssay(data_ID.list[[i]]) <- 'RNA'
data_ID.list[[i]] <- CellCycleScoring(data_ID.list[[i]], s.features = s.genes, g2m.features = g2m.genes, set.ident = F)
DefaultAssay(data_ID.list[[i]]) <- 'RNA'
data_ID.list[[i]]$CC.Difference <- data_ID.list[[i]]$S.Score - data_ID.list[[i]]$G2M.Score
DefaultAssay(data_ID.list[[i]]) <- 'RNA'
data_ID.list[[i]] <- SCTransform(data_ID.list[[i]], verbose = FALSE, method = "glmGamPoi", vst.flavor = "v2", return.only.var.genes = F, assay = "RNA", vars.to.regress = c("pct_mt", "CC.Difference", "pct_ribosomal"), min_cells=4)
}
data_ID.list_MERGED <- merge(data_ID.list[[1]], y = c(data_ID.list[[2]], data_ID.list[[3]],data_ID.list[[4]],data_ID.list[[5]],data_ID.list[[6]],data_ID.list[[7]],data_ID.list[[8]],data_ID.list[[9]],data_ID.list[[10]],data_ID.list[[11]],data_ID.list[[12]],data_ID.list[[13]],data_ID.list[[14]],data_ID.list[[15]],data_ID.list[[16]], data_ID.list[[17]],data_ID.list[[18]],data_ID.list[[19]],data_ID.list[[20]],data_ID.list[[21]],data_ID.list[[22]],data_ID.list[[23]],data_ID.list[[24]]), merge.data = T)
data_ID.list_MERGED_var_features <- SelectIntegrationFeatures(data_ID.list, assay = c("SCT", "SCT", "SCT", "SCT","SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT"), nfeatures = 3000, fvf.nfeatures=3000)
DefaultAssay(data_ID.list_MERGED) <- 'SCT'
VariableFeatures(data_ID.list_MERGED) <- data_ID.list_MERGED_var_features
data_ID.list_MERGED <- RunPCA(data_ID.list_MERGED, verbose = FALSE, npcs=50, assay = 'SCT', features = data_ID.list_MERGED_var_features)
data_ID.list_MERGED <- RunHarmony(data_ID.list_MERGED, reduction = "pca", dims = 1:31, group.by.vars = "data_ID", assay.use = "SCT", reduction.save = "harmony_SCT_QNN")
### Process and integrate ADT data for visualization of protein expression
DefaultAssay(data_ID.list_MERGED) <- 'ADT'
# we will use all ADT features for dimensional reduction
# we set a dimensional reduction name to avoid overwriting the
VariableFeatures(data_ID.list_MERGED) <- rownames(data_ID.list_MERGED[["ADT"]])
data_ID.list_MERGED <- NormalizeData(data_ID.list_MERGED, normalization.method = 'CLR', margin = 2) %>%
ScaleData() %>% RunPCA(reduction.name = 'apca') %>%
RunHarmony(reduction = "apca",dims = 1:20, group.by.vars = "data_ID", assay.use = "ADT", reduction.save = "harmony_ADT_QNN")
### UMAP and clustering based on transcriptome
DefaultAssay(data_ID.list_MERGED) <- 'SCT'
data_ID.list_MERGED <- RunUMAP(data_ID.list_MERGED, dims = 1:31, reduction = "harmony_SCT_QNN", return.model=T) %>% FindNeighbors(reduction = "harmony_SCT_QNN", dims = 1:31)
DefaultAssay(data_ID.list_MERGED) <- 'SCT'
for(i in seq(0,2,0.5)){
data_ID.list_MERGED <- Seurat::FindClusters(data_ID.list_MERGED, algorithm = 4, resolution = i, verbose = T)
}
#Not ran in this example: data_ID.list_MERGED <- PrepSCTFindMarkers(data_ID.list_MERGED, assay = "SCT")
This is where it threw a Warning Warning: NAs introduced by coercion to integer rangeWarning in paste(condition$message, collapse = "\n")
> sessionInfo()
R version 4.2.3 (2023-03-15)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.2.1
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] grid stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggVennDiagram_1.2.2 harmony_0.1.1 Rcpp_1.0.10 scales_1.2.1 flexclust_1.4-1
[6] modeltools_0.2-23 lattice_0.20-45 patchwork_1.1.2.9000 pbmc3k.SeuratData_3.1.4 bmcite.SeuratData_0.3.0
[11] SeuratData_0.2.2 SeuratDisk_0.0.0.9020 scater_1.24.0 scuttle_1.6.3 SingleCellExperiment_1.20.0
[16] SummarizedExperiment_1.28.0 Biobase_2.58.0 GenomicRanges_1.50.2 GenomeInfoDb_1.34.9 IRanges_2.32.0
[21] S4Vectors_0.36.1 BiocGenerics_0.44.0 MatrixGenerics_1.10.0 matrixStats_0.63.0 dplyr_1.1.1
[26] ggplot2_3.4.1 Signac_1.9.0 HCATonsilData_0.0.0.9000 SeuratObject_4.1.3 Seurat_4.3.0
loaded via a namespace (and not attached):
[1] utf8_1.2.3 spatstat.explore_3.1-0 reticulate_1.28 tidyselect_1.2.0
[5] RSQLite_2.3.0 AnnotationDbi_1.60.0 htmlwidgets_1.6.2 BiocParallel_1.30.4
[9] Rtsne_0.16 munsell_0.5.0 ScaledMatrix_1.4.1 codetools_0.2-19
[13] ica_1.0-3 DT_0.27 future_1.32.0 miniUI_0.1.1.1
[17] withr_2.5.0 spatstat.random_3.1-4 colorspace_2.1-0 progressr_0.13.0
[21] filelock_1.0.2 knitr_1.42 rstudioapi_0.14 ROCR_1.0-11
[25] tensor_1.5 listenv_0.9.0 labeling_0.4.2 GenomeInfoDbData_1.2.9
[29] polyclip_1.10-4 farver_2.1.1 bit64_4.0.5 rhdf5_2.42.0
[33] rprojroot_2.0.3 parallelly_1.35.0 vctrs_0.6.1 generics_0.1.3
[37] xfun_0.38 BiocFileCache_2.6.1 R6_2.5.1 ggbeeswarm_0.7.1
[41] rsvd_1.0.5 RVenn_1.1.0 hdf5r_1.3.8 bitops_1.0-7
[45] rhdf5filters_1.10.0 spatstat.utils_3.0-2 cachem_1.0.7 DelayedArray_0.24.0
[49] promises_1.2.0.1 beeswarm_0.4.0 gtable_0.3.3 beachmat_2.12.0
[53] globals_0.16.2 goftest_1.2-3 rlang_1.1.0 RcppRoll_0.3.0
[57] splines_4.2.3 lazyeval_0.2.2 spatstat.geom_3.1-0 BiocManager_1.30.20
[61] yaml_2.3.7 reshape2_1.4.4 abind_1.4-5 httpuv_1.6.9
[65] tools_4.2.3 ellipsis_0.3.2 jquerylib_0.1.4 RColorBrewer_1.1-3
[69] ggridges_0.5.4 plyr_1.8.8 sparseMatrixStats_1.8.0 zlibbioc_1.44.0
[73] purrr_1.0.1 RCurl_1.98-1.10 deldir_1.0-6 viridis_0.6.2
[77] pbapply_1.7-0 cowplot_1.1.1 zoo_1.8-11 ggrepel_0.9.3
[81] cluster_2.1.4 here_1.0.1 magrittr_2.0.3 glmGamPoi_1.8.0
[85] data.table_1.14.8 scattermore_0.8 openxlsx_4.2.5.2 lmtest_0.9-40
[89] RANN_2.6.1 fitdistrplus_1.1-8 evaluate_0.20 mime_0.12
[93] xtable_1.8-4 gridExtra_2.3 compiler_4.2.3 tibble_3.2.1
[97] KernSmooth_2.23-20 crayon_1.5.2 htmltools_0.5.5 later_1.3.0
[101] tidyr_1.3.0 DBI_1.1.3 ExperimentHub_2.6.0 dbplyr_2.3.2
[105] MASS_7.3-58.3 rappdirs_0.3.3 Matrix_1.5-3 cli_3.6.1
[109] parallel_4.2.3 igraph_1.4.1 pkgconfig_2.0.3 sp_1.6-0
[113] plotly_4.10.1.9000 spatstat.sparse_3.0-1 bslib_0.4.2 vipor_0.4.5
[117] XVector_0.38.0 stringr_1.5.0 digest_0.6.31 sctransform_0.3.5.9002
[121] RcppAnnoy_0.0.20 spatstat.data_3.0-1 Biostrings_2.66.0 rmarkdown_2.21
[125] leiden_0.4.3 fastmatch_1.1-3 uwot_0.1.14 DelayedMatrixStats_1.18.2
[129] curl_5.0.0 shiny_1.7.4 Rsamtools_2.12.0 lifecycle_1.0.3.9000
[133] nlme_3.1-162 jsonlite_1.8.4 Rhdf5lib_1.20.0 BiocNeighbors_1.14.0
[137] viridisLite_0.4.1 fansi_1.0.4 pillar_1.9.0 KEGGREST_1.38.0
[141] fastmap_1.1.1 httr_1.4.5 survival_3.5-5 interactiveDisplayBase_1.36.0
[145] glue_1.6.2 zip_2.2.2 png_0.1-8 BiocVersion_3.16.0
[149] bit_4.0.5 sass_0.4.5 class_7.3-21 stringi_1.7.12
[153] HDF5Array_1.26.0 blob_1.2.4 BiocSingular_1.12.0 AnnotationHub_3.6.0
[157] memoise_2.0.1 irlba_2.3.5.1 future.apply_1.10.0
And @yuhanH, if I instead did algorithm = 3
, no warning or error occured. When do you usually prefer the 3 = SLM algorithm
for transcriptome over the Leiden
in some cases? Thank you for your help.
for(i in seq(0,2,0.5)){
data_ID.list_MERGED <- Seurat::FindClusters(data_ID.list_MERGED, algorithm = 3, resolution = i, verbose = T)
}
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 1286
Number of edges: 118834
Running smart local moving algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 1.0000
Number of communities: 1
Elapsed time: 1 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 1286
Number of edges: 118834
Running smart local moving algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.6947
Number of communities: 4
Elapsed time: 1 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 1286
Number of edges: 118834
Running smart local moving algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.5699
Number of communities: 6
Elapsed time: 0 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 1286
Number of edges: 118834
Running smart local moving algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.4674
Number of communities: 9
Elapsed time: 0 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 1286
Number of edges: 118834
Running smart local moving algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.4027
Number of communities: 10
Elapsed time: 0 seconds
I also have this problem. when I run seuobj <- FindClusters(object = seuobj ,algorithm=1,resolution =0.4),its ok.
when I run seuobj <- FindClusters(object = seuobj ,algorithm=4,resolution =0.4), I have same problem
Seurat v4 documentation says:
method Method for running leiden (defaults to matrix which is fast for small datasets). Enable method = "igraph" to avoid casting large data to a dense matrix
algorithm Algorithm for modularity optimization (1 = original Louvain algorithm; 2 = Louvain algorithm with multilevel refinement; 3 = SLM algorithm; 4 = Leiden algorithm). Leiden requires the leidenalg python.
So try method = "igraph"
.
I think documentation should be clearer with this though, method should be after algorithm.
Hi, I have the same problem. A lot of warnings when I use the leiden algorithm (also with method = "igraph"
). Did someone find a fix for this? In the end I do get clusters that are looking well defined in different resolutions, so I'm also wondering how influential these warnings are.
Same for me as RijndertAriese
Same problem as well. Only using Leidgenalg. Problem occurs whether I use igraph or not.
Same issue for me (and my colleagues) as well. I don't think that it affects the output, but would be good if the developers could confirm that.
Hi there,
Thanks for the package.
When I was running
FindClusters(algorithm=4)
, I encountered this Warning. Should I be worried about it? If so, would you mind helping me diagnose this warning?Thank you again for your help.