satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.26k stars 904 forks source link

Warning in FindClusters: sparse->dense coercion: allocating vector of size 2.9 GiBWarning in paste(new, collapse = "\n") : NAs introduced by coercion to integer range #6958

Open denvercal1234GitHub opened 1 year ago

denvercal1234GitHub commented 1 year ago

Hi there,

Thanks for the package.

When I was running FindClusters(algorithm=4), I encountered this Warning. Should I be worried about it? If so, would you mind helping me diagnose this warning?

Thank you again for your help.

Warning: sparse->dense coercion: allocating vector of size 2.9 GiBWarning in paste(new, collapse = "\n") :
  NAs introduced by coercion to integer range
2 singletons identified. 16 final clusters.
Warning: sparse->dense coercion: allocating vector of size 2.9 GiB3 singletons identified. 18 final clusters.
Warning: sparse->dense coercion: allocating vector of size 2.9 GiB11 singletons identified. 19 final clusters.
AustinHartman commented 1 year ago

Could you provide a small reproducible example? It is possible that the graph provided to the leiden algorithm unexpectedly contains some non-integer values.

yuhanH commented 1 year ago

Do you meet the similar warning if you use other clustering algorithm, such as algorithm = 3?

denvercal1234GitHub commented 1 year ago

Thanks, @AustinHartman for your response. The code is below. Do you detect any thing wrong in the steps before FindClusters()?

You can download the data (12.5MB) at https://drive.google.com/file/d/1zBO1XAPnlt-SCUwWcgFWnOX4fcEA-Ve1/view?usp=share_link. It was a subset of cells from the 20220215_tonsil_atlas_cite_seurat_obj(from https://zenodo.org/record/6340174#.ZCFjkBXMI0R)

load(".../Massoni_20220215_tonsil_atlas_cite_seurat_obj_CD8Tcells_BCLL15_8_9.RData")

data_ID.list  <- SplitObject(Massoni_20220215_tonsil_atlas_cite_seurat_obj_CD8Tcells_BCLL15_8_9, split.by = "data_ID")

### Process RNA data with SCTransform for RNA-based clustering 
for (i in 1:length(data_ID.list)) {
  DefaultAssay(data_ID.list[[i]]) <- 'RNA'
   data_ID.list[[i]] <- NormalizeData(data_ID.list[[i]], assay = 'RNA')
  DefaultAssay(data_ID.list[[i]]) <- 'RNA'
  data_ID.list[[i]] <- CellCycleScoring(data_ID.list[[i]], s.features = s.genes, g2m.features = g2m.genes, set.ident = F)
  DefaultAssay(data_ID.list[[i]]) <- 'RNA'
 data_ID.list[[i]]$CC.Difference <- data_ID.list[[i]]$S.Score - data_ID.list[[i]]$G2M.Score
  DefaultAssay(data_ID.list[[i]]) <- 'RNA'
  data_ID.list[[i]] <- SCTransform(data_ID.list[[i]], verbose = FALSE, method = "glmGamPoi", vst.flavor = "v2", return.only.var.genes = F, assay = "RNA", vars.to.regress = c("pct_mt", "CC.Difference", "pct_ribosomal"), min_cells=4)
}

data_ID.list_MERGED <- merge(data_ID.list[[1]], y = c(data_ID.list[[2]], data_ID.list[[3]],data_ID.list[[4]],data_ID.list[[5]],data_ID.list[[6]],data_ID.list[[7]],data_ID.list[[8]],data_ID.list[[9]],data_ID.list[[10]],data_ID.list[[11]],data_ID.list[[12]],data_ID.list[[13]],data_ID.list[[14]],data_ID.list[[15]],data_ID.list[[16]], data_ID.list[[17]],data_ID.list[[18]],data_ID.list[[19]],data_ID.list[[20]],data_ID.list[[21]],data_ID.list[[22]],data_ID.list[[23]],data_ID.list[[24]]), merge.data = T)

data_ID.list_MERGED_var_features <- SelectIntegrationFeatures(data_ID.list, assay = c("SCT", "SCT", "SCT", "SCT","SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT"), nfeatures = 3000, fvf.nfeatures=3000)

DefaultAssay(data_ID.list_MERGED) <- 'SCT'
VariableFeatures(data_ID.list_MERGED) <- data_ID.list_MERGED_var_features

data_ID.list_MERGED <- RunPCA(data_ID.list_MERGED, verbose = FALSE, npcs=50, assay = 'SCT', features = data_ID.list_MERGED_var_features)

data_ID.list_MERGED <- RunHarmony(data_ID.list_MERGED, reduction = "pca", dims = 1:31, group.by.vars = "data_ID", assay.use = "SCT", reduction.save = "harmony_SCT_QNN")

### Process and integrate ADT data for visualization of protein expression
DefaultAssay(data_ID.list_MERGED) <- 'ADT'

# we will use all ADT features for dimensional reduction
# we set a dimensional reduction name to avoid overwriting the 
VariableFeatures(data_ID.list_MERGED) <- rownames(data_ID.list_MERGED[["ADT"]])

data_ID.list_MERGED <- NormalizeData(data_ID.list_MERGED, normalization.method = 'CLR', margin = 2) %>% 
  ScaleData() %>% RunPCA(reduction.name = 'apca') %>%
    RunHarmony(reduction = "apca",dims = 1:20, group.by.vars = "data_ID", assay.use = "ADT", reduction.save = "harmony_ADT_QNN")

### UMAP and clustering based on transcriptome 
DefaultAssay(data_ID.list_MERGED) <- 'SCT'

data_ID.list_MERGED <- RunUMAP(data_ID.list_MERGED, dims = 1:31, reduction = "harmony_SCT_QNN", return.model=T)  %>% FindNeighbors(reduction = "harmony_SCT_QNN", dims = 1:31)

DefaultAssay(data_ID.list_MERGED) <- 'SCT'

for(i in seq(0,2,0.5)){
  data_ID.list_MERGED <- Seurat::FindClusters(data_ID.list_MERGED, algorithm = 4, resolution = i, verbose = T)
}

#Not ran in this example: data_ID.list_MERGED <- PrepSCTFindMarkers(data_ID.list_MERGED, assay = "SCT")

This is where it threw a Warning Warning: NAs introduced by coercion to integer rangeWarning in paste(condition$message, collapse = "\n")

Screenshot 2023-03-27 at 11 30 36
> sessionInfo()
R version 4.2.3 (2023-03-15)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.2.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggVennDiagram_1.2.2         harmony_0.1.1               Rcpp_1.0.10                 scales_1.2.1                flexclust_1.4-1            
 [6] modeltools_0.2-23           lattice_0.20-45             patchwork_1.1.2.9000        pbmc3k.SeuratData_3.1.4     bmcite.SeuratData_0.3.0    
[11] SeuratData_0.2.2            SeuratDisk_0.0.0.9020       scater_1.24.0               scuttle_1.6.3               SingleCellExperiment_1.20.0
[16] SummarizedExperiment_1.28.0 Biobase_2.58.0              GenomicRanges_1.50.2        GenomeInfoDb_1.34.9         IRanges_2.32.0             
[21] S4Vectors_0.36.1            BiocGenerics_0.44.0         MatrixGenerics_1.10.0       matrixStats_0.63.0          dplyr_1.1.1                
[26] ggplot2_3.4.1               Signac_1.9.0                HCATonsilData_0.0.0.9000    SeuratObject_4.1.3          Seurat_4.3.0               

loaded via a namespace (and not attached):
  [1] utf8_1.2.3                    spatstat.explore_3.1-0        reticulate_1.28               tidyselect_1.2.0             
  [5] RSQLite_2.3.0                 AnnotationDbi_1.60.0          htmlwidgets_1.6.2             BiocParallel_1.30.4          
  [9] Rtsne_0.16                    munsell_0.5.0                 ScaledMatrix_1.4.1            codetools_0.2-19             
 [13] ica_1.0-3                     DT_0.27                       future_1.32.0                 miniUI_0.1.1.1               
 [17] withr_2.5.0                   spatstat.random_3.1-4         colorspace_2.1-0              progressr_0.13.0             
 [21] filelock_1.0.2                knitr_1.42                    rstudioapi_0.14               ROCR_1.0-11                  
 [25] tensor_1.5                    listenv_0.9.0                 labeling_0.4.2                GenomeInfoDbData_1.2.9       
 [29] polyclip_1.10-4               farver_2.1.1                  bit64_4.0.5                   rhdf5_2.42.0                 
 [33] rprojroot_2.0.3               parallelly_1.35.0             vctrs_0.6.1                   generics_0.1.3               
 [37] xfun_0.38                     BiocFileCache_2.6.1           R6_2.5.1                      ggbeeswarm_0.7.1             
 [41] rsvd_1.0.5                    RVenn_1.1.0                   hdf5r_1.3.8                   bitops_1.0-7                 
 [45] rhdf5filters_1.10.0           spatstat.utils_3.0-2          cachem_1.0.7                  DelayedArray_0.24.0          
 [49] promises_1.2.0.1              beeswarm_0.4.0                gtable_0.3.3                  beachmat_2.12.0              
 [53] globals_0.16.2                goftest_1.2-3                 rlang_1.1.0                   RcppRoll_0.3.0               
 [57] splines_4.2.3                 lazyeval_0.2.2                spatstat.geom_3.1-0           BiocManager_1.30.20          
 [61] yaml_2.3.7                    reshape2_1.4.4                abind_1.4-5                   httpuv_1.6.9                 
 [65] tools_4.2.3                   ellipsis_0.3.2                jquerylib_0.1.4               RColorBrewer_1.1-3           
 [69] ggridges_0.5.4                plyr_1.8.8                    sparseMatrixStats_1.8.0       zlibbioc_1.44.0              
 [73] purrr_1.0.1                   RCurl_1.98-1.10               deldir_1.0-6                  viridis_0.6.2                
 [77] pbapply_1.7-0                 cowplot_1.1.1                 zoo_1.8-11                    ggrepel_0.9.3                
 [81] cluster_2.1.4                 here_1.0.1                    magrittr_2.0.3                glmGamPoi_1.8.0              
 [85] data.table_1.14.8             scattermore_0.8               openxlsx_4.2.5.2              lmtest_0.9-40                
 [89] RANN_2.6.1                    fitdistrplus_1.1-8            evaluate_0.20                 mime_0.12                    
 [93] xtable_1.8-4                  gridExtra_2.3                 compiler_4.2.3                tibble_3.2.1                 
 [97] KernSmooth_2.23-20            crayon_1.5.2                  htmltools_0.5.5               later_1.3.0                  
[101] tidyr_1.3.0                   DBI_1.1.3                     ExperimentHub_2.6.0           dbplyr_2.3.2                 
[105] MASS_7.3-58.3                 rappdirs_0.3.3                Matrix_1.5-3                  cli_3.6.1                    
[109] parallel_4.2.3                igraph_1.4.1                  pkgconfig_2.0.3               sp_1.6-0                     
[113] plotly_4.10.1.9000            spatstat.sparse_3.0-1         bslib_0.4.2                   vipor_0.4.5                  
[117] XVector_0.38.0                stringr_1.5.0                 digest_0.6.31                 sctransform_0.3.5.9002       
[121] RcppAnnoy_0.0.20              spatstat.data_3.0-1           Biostrings_2.66.0             rmarkdown_2.21               
[125] leiden_0.4.3                  fastmatch_1.1-3               uwot_0.1.14                   DelayedMatrixStats_1.18.2    
[129] curl_5.0.0                    shiny_1.7.4                   Rsamtools_2.12.0              lifecycle_1.0.3.9000         
[133] nlme_3.1-162                  jsonlite_1.8.4                Rhdf5lib_1.20.0               BiocNeighbors_1.14.0         
[137] viridisLite_0.4.1             fansi_1.0.4                   pillar_1.9.0                  KEGGREST_1.38.0              
[141] fastmap_1.1.1                 httr_1.4.5                    survival_3.5-5                interactiveDisplayBase_1.36.0
[145] glue_1.6.2                    zip_2.2.2                     png_0.1-8                     BiocVersion_3.16.0           
[149] bit_4.0.5                     sass_0.4.5                    class_7.3-21                  stringi_1.7.12               
[153] HDF5Array_1.26.0              blob_1.2.4                    BiocSingular_1.12.0           AnnotationHub_3.6.0          
[157] memoise_2.0.1                 irlba_2.3.5.1                 future.apply_1.10.0          
denvercal1234GitHub commented 1 year ago

And @yuhanH, if I instead did algorithm = 3, no warning or error occured. When do you usually prefer the 3 = SLM algorithm for transcriptome over the Leiden in some cases? Thank you for your help.

for(i in seq(0,2,0.5)){
  data_ID.list_MERGED <- Seurat::FindClusters(data_ID.list_MERGED, algorithm = 3, resolution = i, verbose = T)
}
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 1286
Number of edges: 118834

Running smart local moving algorithm...
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 1.0000
Number of communities: 1
Elapsed time: 1 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 1286
Number of edges: 118834

Running smart local moving algorithm...
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.6947
Number of communities: 4
Elapsed time: 1 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 1286
Number of edges: 118834

Running smart local moving algorithm...
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.5699
Number of communities: 6
Elapsed time: 0 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 1286
Number of edges: 118834

Running smart local moving algorithm...
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.4674
Number of communities: 9
Elapsed time: 0 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 1286
Number of edges: 118834

Running smart local moving algorithm...
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.4027
Number of communities: 10
Elapsed time: 0 seconds
Harmony714 commented 1 year ago

I also have this problem. when I run seuobj <- FindClusters(object = seuobj ,algorithm=1,resolution =0.4),its ok.

when I run seuobj <- FindClusters(object = seuobj ,algorithm=4,resolution =0.4), I have same problem

微信截图_20230512221113
eonurk commented 12 months ago

Seurat v4 documentation says:

method Method for running leiden (defaults to matrix which is fast for small datasets). Enable method = "igraph" to avoid casting large data to a dense matrix

algorithm Algorithm for modularity optimization (1 = original Louvain algorithm; 2 = Louvain algorithm with multilevel refinement; 3 = SLM algorithm; 4 = Leiden algorithm). Leiden requires the leidenalg python.

So try method = "igraph".

I think documentation should be clearer with this though, method should be after algorithm.

RijndertAriese commented 11 months ago

Hi, I have the same problem. A lot of warnings when I use the leiden algorithm (also with method = "igraph"). Did someone find a fix for this? In the end I do get clusters that are looking well defined in different resolutions, so I'm also wondering how influential these warnings are.

carlacohen commented 2 weeks ago

Same for me as RijndertAriese

JordanWean commented 1 week ago

Same problem as well. Only using Leidgenalg. Problem occurs whether I use igraph or not.

SteGruener commented 1 week ago

Same issue for me (and my colleagues) as well. I don't think that it affects the output, but would be good if the developers could confirm that.