Warning in FindClusters: sparse->dense coercion: allocating vector of size 2.9 GiBWarning in paste(new, collapse = "\n") : NAs introduced by coercion to integer range #6958

Open denvercal1234GitHub opened 1 year ago

denvercal1234GitHub commented 1 year ago

Hi there,

Thanks for the package.

When I was running FindClusters(algorithm=4), I encountered this Warning. Should I be worried about it? If so, would you mind helping me diagnose this warning?

Thank you again for your help.

Warning: sparse->dense coercion: allocating vector of size 2.9 GiBWarning in paste(new, collapse = "\n") :
  NAs introduced by coercion to integer range
2 singletons identified. 16 final clusters.
Warning: sparse->dense coercion: allocating vector of size 2.9 GiB3 singletons identified. 18 final clusters.
Warning: sparse->dense coercion: allocating vector of size 2.9 GiB11 singletons identified. 19 final clusters.
AustinHartman commented 1 year ago

Could you provide a small reproducible example? It is possible that the graph provided to the leiden algorithm unexpectedly contains some non-integer values.

yuhanH commented 1 year ago

Do you meet the similar warning if you use other clustering algorithm, such as algorithm = 3?

denvercal1234GitHub commented 1 year ago

Thanks, @AustinHartman for your response. The code is below. Do you detect any thing wrong in the steps before FindClusters()?

You can download the data (12.5MB) at It was a subset of cells from the 20220215_tonsil_atlas_cite_seurat_obj(from


data_ID.list  <- SplitObject(Massoni_20220215_tonsil_atlas_cite_seurat_obj_CD8Tcells_BCLL15_8_9, = "data_ID")

### Process RNA data with SCTransform for RNA-based clustering 
for (i in 1:length(data_ID.list)) {
  DefaultAssay(data_ID.list[[i]]) <- 'RNA'
   data_ID.list[[i]] <- NormalizeData(data_ID.list[[i]], assay = 'RNA')
  DefaultAssay(data_ID.list[[i]]) <- 'RNA'
  data_ID.list[[i]] <- CellCycleScoring(data_ID.list[[i]], s.features = s.genes, g2m.features = g2m.genes, set.ident = F)
  DefaultAssay(data_ID.list[[i]]) <- 'RNA'
 data_ID.list[[i]]$CC.Difference <- data_ID.list[[i]]$S.Score - data_ID.list[[i]]$G2M.Score
  DefaultAssay(data_ID.list[[i]]) <- 'RNA'
  data_ID.list[[i]] <- SCTransform(data_ID.list[[i]], verbose = FALSE, method = "glmGamPoi", vst.flavor = "v2", return.only.var.genes = F, assay = "RNA", = c("pct_mt", "CC.Difference", "pct_ribosomal"), min_cells=4)

data_ID.list_MERGED <- merge(data_ID.list[[1]], y = c(data_ID.list[[2]], data_ID.list[[3]],data_ID.list[[4]],data_ID.list[[5]],data_ID.list[[6]],data_ID.list[[7]],data_ID.list[[8]],data_ID.list[[9]],data_ID.list[[10]],data_ID.list[[11]],data_ID.list[[12]],data_ID.list[[13]],data_ID.list[[14]],data_ID.list[[15]],data_ID.list[[16]], data_ID.list[[17]],data_ID.list[[18]],data_ID.list[[19]],data_ID.list[[20]],data_ID.list[[21]],data_ID.list[[22]],data_ID.list[[23]],data_ID.list[[24]]), = T)

data_ID.list_MERGED_var_features <- SelectIntegrationFeatures(data_ID.list, assay = c("SCT", "SCT", "SCT", "SCT","SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT", "SCT"), nfeatures = 3000, fvf.nfeatures=3000)

DefaultAssay(data_ID.list_MERGED) <- 'SCT'
VariableFeatures(data_ID.list_MERGED) <- data_ID.list_MERGED_var_features

data_ID.list_MERGED <- RunPCA(data_ID.list_MERGED, verbose = FALSE, npcs=50, assay = 'SCT', features = data_ID.list_MERGED_var_features)

data_ID.list_MERGED <- RunHarmony(data_ID.list_MERGED, reduction = "pca", dims = 1:31, = "data_ID", assay.use = "SCT", = "harmony_SCT_QNN")

### Process and integrate ADT data for visualization of protein expression
DefaultAssay(data_ID.list_MERGED) <- 'ADT'

# we will use all ADT features for dimensional reduction
# we set a dimensional reduction name to avoid overwriting the 
VariableFeatures(data_ID.list_MERGED) <- rownames(data_ID.list_MERGED[["ADT"]])

data_ID.list_MERGED <- NormalizeData(data_ID.list_MERGED, normalization.method = 'CLR', margin = 2) %>% 
  ScaleData() %>% RunPCA( = 'apca') %>%
    RunHarmony(reduction = "apca",dims = 1:20, = "data_ID", assay.use = "ADT", = "harmony_ADT_QNN")

### UMAP and clustering based on transcriptome 
DefaultAssay(data_ID.list_MERGED) <- 'SCT'

data_ID.list_MERGED <- RunUMAP(data_ID.list_MERGED, dims = 1:31, reduction = "harmony_SCT_QNN", return.model=T)  %>% FindNeighbors(reduction = "harmony_SCT_QNN", dims = 1:31)

DefaultAssay(data_ID.list_MERGED) <- 'SCT'

for(i in seq(0,2,0.5)){
  data_ID.list_MERGED <- Seurat::FindClusters(data_ID.list_MERGED, algorithm = 4, resolution = i, verbose = T)

#Not ran in this example: data_ID.list_MERGED <- PrepSCTFindMarkers(data_ID.list_MERGED, assay = "SCT")

This is where it threw a Warning Warning: NAs introduced by coercion to integer rangeWarning in paste(condition$message, collapse = "\n")

Screenshot 2023-03-27 at 11 30 36
> sessionInfo()
R version 4.2.3 (2023-03-15)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.2.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[26] ggplot2_3.4.1               Signac_1.9.0                HCATonsilData_0.0.0.9000    SeuratObject_4.1.3          Seurat_4.3.0               

denvercal1234GitHub commented 1 year ago

And @yuhanH, if I instead did algorithm = 3, no warning or error occured. When do you usually prefer the 3 = SLM algorithm for transcriptome over the Leiden in some cases? Thank you for your help.

for(i in seq(0,2,0.5)){
  data_ID.list_MERGED <- Seurat::FindClusters(data_ID.list_MERGED, algorithm = 3, resolution = i, verbose = T)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 1286
Number of edges: 118834

Running smart local moving algorithm...
0%   10   20   30   40   50   60   70   80   90   100%
Maximum modularity in 10 random starts: 1.0000
Number of communities: 1
Elapsed time: 1 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 1286
Number of edges: 118834

Running smart local moving algorithm...
0%   10   20   30   40   50   60   70   80   90   100%
Maximum modularity in 10 random starts: 0.6947
Number of communities: 4
Elapsed time: 1 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 1286
Number of edges: 118834

Running smart local moving algorithm...
0%   10   20   30   40   50   60   70   80   90   100%
Maximum modularity in 10 random starts: 0.5699
Number of communities: 6
Elapsed time: 0 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 1286
Number of edges: 118834

Running smart local moving algorithm...
0%   10   20   30   40   50   60   70   80   90   100%
Maximum modularity in 10 random starts: 0.4674
Number of communities: 9
Elapsed time: 0 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 1286
Number of edges: 118834

Running smart local moving algorithm...
0%   10   20   30   40   50   60   70   80   90   100%
Maximum modularity in 10 random starts: 0.4027
Number of communities: 10
Elapsed time: 0 seconds
Harmony714 commented 1 year ago

I also have this problem. when I run seuobj <- FindClusters(object = seuobj ,algorithm=1,resolution =0.4),its ok.

when I run seuobj <- FindClusters(object = seuobj ,algorithm=4,resolution =0.4), I have same problem

eonurk commented 12 months ago

Seurat v4 documentation says:

method Method for running leiden (defaults to matrix which is fast for small datasets). Enable method = "igraph" to avoid casting large data to a dense matrix

algorithm Algorithm for modularity optimization (1 = original Louvain algorithm; 2 = Louvain algorithm with multilevel refinement; 3 = SLM algorithm; 4 = Leiden algorithm). Leiden requires the leidenalg python.

So try method = "igraph".

I think documentation should be clearer with this though, method should be after algorithm.

RijndertAriese commented 11 months ago

Hi, I have the same problem. A lot of warnings when I use the leiden algorithm (also with method = "igraph"). Did someone find a fix for this? In the end I do get clusters that are looking well defined in different resolutions, so I'm also wondering how influential these warnings are.

carlacohen commented 2 weeks ago

Same for me as RijndertAriese

JordanWean commented 1 week ago

Same problem as well. Only using Leidgenalg. Problem occurs whether I use igraph or not.

SteGruener commented 1 week ago

Same issue for me (and my colleagues) as well. I don't think that it affects the output, but would be good if the developers could confirm that.