satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.26k stars 904 forks source link

RunPCA failure using standard integration approach #4426

Closed errcricket closed 3 years ago

errcricket commented 3 years ago

Issue

I have several samples (10X output) that I have merged into one Seurat object. I had been using similar code for months with no problems, but recently, I have not been able to get past the RunPCA step and am getting a very consistent error.

Error

Warning in irlba(A = t(x = object), nv = npcs, ...) :
  You're computing too large a percentage of total singular values, use a standard svd instead.
Error in nv + 7 : non-numeric argument to binary operator
Calls: transform_object ... RunPCA -> RunPCA.Assay -> RunPCA -> RunPCA.default -> irlba
Execution halted

I have tried re-running it, re-installing all used packages, and based off of this post: https://github.com/satijalab/seurat/issues/1963, I added approx=FALSE, all to no avail.

Session Info:

> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux

Matrix products: default
BLAS:   bee/installed_packages/bin/bin/R/lib64/R/lib/libRblas.so
LAPACK: bee/installed_packages/bin/bin/R/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base    

other attached packages:
 [1] multtest_2.46.0             metap_1.4    
 [3] ggrepel_0.9.1               ggplot2_3.3.3    
 [5] data.table_1.14.0           Matrix_1.3-2    
 [7] dplyr_1.0.5                 magrittr_2.0.1    
 [9] patchwork_1.1.1             cowplot_1.1.1    
[11] DropletUtils_1.10.3         SingleCellExperiment_1.12.0
[13] SummarizedExperiment_1.20.0 Biobase_2.50.0    
[15] GenomicRanges_1.42.0        GenomeInfoDb_1.26.2    
[17] IRanges_2.24.1              S4Vectors_0.28.1    
[19] BiocGenerics_0.36.0         MatrixGenerics_1.2.0    
[21] matrixStats_0.58.0          SeuratObject_4.0.0    
[23] Seurat_4.0.1    

loaded via a namespace (and not attached):
  [1] sn_1.6-2                  plyr_1.8.6
  [3] igraph_1.2.6              lazyeval_0.2.2
  [5] splines_4.0.3             BiocParallel_1.24.1
  [7] listenv_0.8.0             scattermore_0.7
  [9] TH.data_1.0-10            digest_0.6.27
 [11] htmltools_0.5.1.1         fansi_0.4.2
 [13] tensor_1.5                cluster_2.1.0
 [15] ROCR_1.0-11               limma_3.46.0
 [17] globals_0.14.0            R.utils_2.10.1
 [19] sandwich_3.0-0            spatstat.sparse_2.0-0
 [21] colorspace_2.0-0          rbibutils_2.0
 [23] crayon_1.4.1              RCurl_1.98-1.2    
 [25] jsonlite_1.7.2            spatstat.data_2.1-0    
 [27] survival_3.2-7            zoo_1.8-9    
 [29] glue_1.4.2                polyclip_1.10-0    
 [31] gtable_0.3.0              zlibbioc_1.36.0    
 [33] XVector_0.30.0            leiden_0.3.7    
 [35] DelayedArray_0.16.0       Rhdf5lib_1.12.0    
 [37] future.apply_1.7.0        HDF5Array_1.18.0    
 [39] abind_1.4-5               scales_1.1.1
 [41] mvtnorm_1.1-1             edgeR_3.32.1
 [43] miniUI_0.1.1.1            Rcpp_1.0.6
 [45] plotrix_3.8-1             viridisLite_0.4.0
 [47] xtable_1.8-4              tmvnsim_1.0-2
 [49] reticulate_1.18           spatstat.core_2.0-0
 [51] dqrng_0.2.1               htmlwidgets_1.5.3
 [53] httr_1.4.2                RColorBrewer_1.1-2
 [55] TFisher_0.2.0             ellipsis_0.3.1
 [57] ica_1.0-2                 pkgconfig_2.0.3
 [59] R.methodsS3_1.8.1         scuttle_1.0.4
 [61] uwot_0.1.10               deldir_0.2-10
 [63] locfit_1.5-9.4            utf8_1.2.1
 [65] tidyselect_1.1.0          rlang_0.4.10
 [67] reshape2_1.4.4            later_1.1.0.1
 [69] munsell_0.5.0             tools_4.0.3
 [71] generics_0.1.0            mathjaxr_1.0-1
 [73] ggridges_0.5.3            stringr_1.4.0
 [75] fastmap_1.1.0             goftest_1.2-2
 [77] fitdistrplus_1.1-3        purrr_0.3.4
 [79] RANN_2.6.1                pbapply_1.4-3
 [81] future_1.21.0             nlme_3.1-151
 [83] sparseMatrixStats_1.2.0   mime_0.10
 [85] R.oo_1.24.0               compiler_4.0.3
 [87] plotly_4.9.3              png_0.1-7
 [89] spatstat.utils_2.1-0      tibble_3.1.0
 [91] stringi_1.5.3             lattice_0.20-41
 [93] vctrs_0.3.7               mutoss_0.1-12
 [95] pillar_1.6.0              lifecycle_1.0.0
 [97] rhdf5filters_1.2.0        spatstat.geom_2.1-0
 [99] Rdpack_2.1                lmtest_0.9-38
[101] RcppAnnoy_0.0.18          bitops_1.0-6
[103] irlba_2.3.3               gbRd_0.4-11
[105] httpuv_1.5.5              R6_2.5.0
[107] promises_1.2.0.1          KernSmooth_2.23-18
[109] gridExtra_2.3             parallelly_1.24.0
[111] codetools_0.2-18          MASS_7.3-53
[113] rhdf5_2.34.0              withr_2.4.1
[115] mnormt_2.0.2              sctransform_0.3.2
[117] multcomp_1.4-15           GenomeInfoDbData_1.2.4
[119] mgcv_1.8-33               grid_4.0.3
[121] rpart_4.1-15              beachmat_2.6.3
[123] tidyr_1.1.3               DelayedMatrixStats_1.12.1
[125] Rtsne_0.15                numDeriv_2016.8-1.1
[127] shiny_1.6.0

Code:

components <<- 30

create_seurat_object <- function()
{
   S1.data = 'data/sample_data/clean/B0H/soupX/'
   S1 <- Read10X(data.dir=S1.data)
   S1  <- CreateSeuratObject(counts=S1, project='B0H', min.cells=3, min.features=200)
   S1 <- AddMetaData(S1, metadata='B0H', col.name='Experiment')

   S2.data = 'data/sample_data/clean/B4H/soupX/'
   S2 <- Read10X(data.dir=S2.data)
   S2  <- CreateSeuratObject(counts=S2, project='B4H', min.cells=3, min.features=200)
   S2 <- AddMetaData(S2, metadata='B4H', col.name='Experiment')

   S3.data = 'data/sample_data/clean/B8H/soupX/'
   S3 <- Read10X(data.dir=S3.data)
   S3  <- CreateSeuratObject(counts=S3, project='B8H', min.cells=3, min.features=200)
   S3 <- AddMetaData(S3, metadata='B8H', col.name='Experiment')

   S4.data = 'data/sample_data/clean/B24H/soupX/'
   S4 <- Read10X(data.dir=S4.data)
   S4  <- CreateSeuratObject(counts=S4, project='B24H', min.cells=3, min.features=200)
   S4 <- AddMetaData(S4, metadata='B24H', col.name='Experiment')

   S5.data = 'data/sample_data/clean/B48H/soupX/'
   S5 <- Read10X(data.dir=S5.data)
   S5  <- CreateSeuratObject(counts=S5, project='B48H', min.cells=3, min.features=200)
   S5 <- AddMetaData(S5, metadata='B48H', col.name='Experiment')

   S.merged <- merge(S1, y = c(S2,S3,S4,S5), add.cell.ids = c('S1','S2','S3','S4','S5'), project = 'Blood')

   S.merged[['percent.mito']] <- PercentageFeatureSet(S.merged, pattern = '^mt-')
   S <- subset(S.merged, subset = nFeature_RNA > 200 & nFeature_RNA < 3000  & percent.mito < 15) 
}

S <- create_seurat_object()

transform_object <- function(seurat_object)
{
   print('Transforming')

   SO.list <- SplitObject(seurat_object, split.by = 'Experiment')

   SO.list <- lapply(X = SO.list, FUN = function(x) {
         x <- NormalizeData(x)
         x <- FindVariableFeatures(x, selection.method = 'vst', nfeatures = 2000)
   }) 

   features <- SelectIntegrationFeatures(object.list = SO.list)

   SO.anchors <- FindIntegrationAnchors(object.list = SO.list, anchor.features = features)
   seurat_object <- IntegrateData(anchorset = SO.anchors)
   DefaultAssay(seurat_object) <- 'integrated'

   seurat_object <- ScaleData(seurat_object, verbose = FALSE)

   seurat_object <- RunPCA(seurat_object, npcs = components, verbose = FALSE)
   seurat_object <- RunUMAP(seurat_object, reduction = 'pca', dims = 1:components)
   seurat_object <- FindNeighbors(seurat_object, reduction = 'pca', dims = 1:components)
   seurat_object <- FindClusters(seurat_object, resolution=.8)

   return(seurat_object)
}

S <- transform_object(S)

I am pretty stuck, any assistance is Greatly appreciated.

k3yavi commented 3 years ago

Hi @errcricket ,

Thanks for reaching out. Can you check if the number of features is close to the number of PCs you are requesting ? If not, can you also check the same code on am open 10x dataset like pbmc10k, if you can replicate the error ?

errcricket commented 3 years ago

I will try both and get back to you.

no-response[bot] commented 3 years ago

This issue has been automatically closed because there has been no response to our request for more information from the original author. With only the information that is currently in the issue, we don't have enough information to take action. Please reach out if you have or find the answers we need so that we can investigate further.

errcricket commented 3 years ago

I have gone back to this and regardless of the dataset, I am still getting this error:

 Error in nv + 7 : non-numeric argument to binary operator
 Calls: transform_object ... RunPCA -> RunPCA.Assay -> RunPCA -> RunPCA.default -> irlba
 Execution halted

There is generally a minimum of 1000 features for all of the datasets and I cannot get past RunPCA on any of them. This is an issue exclusively when I am integrating data. Please advise.

errcricket commented 3 years ago

I am 99.9% certain I have found the issue.

   seurat_object <- RunPCA(seurat_object, npcs = components, verbose = FALSE)

I think components is a reserved word. As soon as I changed it to compo, all of my errors dissipated.

   seurat_object <- RunPCA(seurat_object, npcs = compo, verbose = FALSE)

Follow-up: Even though compo/components is a number, I used compo <- as.numeric(compo) and that too seems to have fixed the issue.