satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.3k stars 917 forks source link

Problems with RunPCA when npcs > 25 #8908

Closed milani31 closed 2 months ago

milani31 commented 6 months ago

Dear Seurat Team, thank you for developing this amazing tool!

I am completely new to R and RNA-seq analysis and especially scRNA analysis. I am trying to reproduce the steps of the PBMC 3K guided tutorial and encountered some issues while running it. My R version, RStudio, as well as all the packages are freshly installed with the latest versions (Seurat 5.1.0, SeuratObject 5.0.2). The pbmc3k dataset was downloaded from the link in the tutorial. The code is diplayed below.

pbmc.data <- Read10X(datapath)
pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k", min.cells = 3, min.features = 200)
pbmc

pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")

pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)

pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize", scale.factor = 10000)
pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = 2000)

top10 <- head(VariableFeatures(pbmc), 10)

plot1 <- VariableFeaturePlot(pbmc)
plot2 <- LabelPoints(plot = plot1, points = top10, repel = TRUE)
plot1 + plot2

all.genes <- rownames(pbmc)
pbmc <- ScaleData(pbmc, features = all.genes)

pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc))

print(pbmc[["pca"]], dims = 1:5, nfeatures = 5)
DimPlot(pbmc, reduction = "pca") + NoLegend()

When examining the PCA results, they are very different from what is expected. What concerns me most is the Dimensional reduction plot showing high exponential values on the x and y axis (see image).

PC 1 Positive: GZMK, NCR3, VPS13C, TMSB4X, MT-CO2 Negative: SELL, IGFBP7, PSMC6, LIG1, ARRDC3 PC 2 Positive: GZMK, NCR3, VPS13C, TMSB4X, MT-CO2 Negative: SELL, IGFBP7, PSMC6, LIG1, ARRDC3 PC 3 Positive: SELL, IGFBP7, PSMC6, LIG1, GPKOW Negative: GZMK, NCR3, VPS13C, TMSB4X, MT-CO2 PC 4 Positive: GZMK, NCR3, VPS13C, TMSB4X, KLRG1 Negative: SELL, IGFBP7, PSMC6, LIG1, GPKOW PC_ 5 Positive: PSMC6, SELL, IGFBP7, ARRDC3, GNB2 Negative: GZMK, NCR3, VPS13C, MT-CO2, ITSN2

pbmc3k_DimPlot_RunPCA_pscs_50

From this point on it is not possible to continue further analysis steps. The error appearing after trying to run UMAP hints that infinite values somehow are inside the input matrix but I honestly have no clue how or when this could have happened.

pbmc <- RunUMAP(pbmc, dims = 1:10)

Error in x2set(Xsub, n_neighbors, metric, nn_method = nn_sub, n_trees, : Non-finite entries in the input matrix

However, when setting npcs = 25 in the RunPCA function the results are comparable the the tutorial output, even the Dimensional reduction plot.

pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc), npcs=25)

print(pbmc[["pca"]], dims = 1:5, nfeatures = 5)

PC 1 Positive: CST3, TYROBP, LST1, AIF1, FTL Negative: MALAT1, LTB, IL32, IL7R, CD2 PC 2 Positive: CD79A, MS4A1, TCL1A, HLA-DQA1, HLA-DQB1 Negative: NKG7, PRF1, CST7, GZMB, GZMA PC 3 Positive: PPBP, PF4, SDPR, SPARC, GNG11 Negative: HLA-DQA1, CD79A, CD79B, HLA-DQB1, HLA-DPB1 PC 4 Positive: HLA-DQA1, CD79B, CD79A, MS4A1, HLA-DQB1 Negative: VIM, IL7R, S100A6, IL32, S100A8 PC_ 5 Positive: LTB, IL7R, CKB, VIM, MS4A7 Negative: GZMB, NKG7, S100A8, FGFBP2, GNLY

I already downgraded to Seurat version 4.4.0 and SeuratObject version 4.1.4 but I still get the same problem. I also used other datasets and used SCTransform and even reinstalled R. I tried to find a solution online where that issue was mentioned somehow but there was no explanation what could be causing this and how to fix this. I would be very grateful for any suggestions.

sessionInfo() R version 4.4.0 (2024-04-24) Platform: x86_64-pc-linux-gnu Running under: Ubuntu 22.04.4 LTS

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0

locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=de_DE.UTF-8 LC_COLLATE=en_GB.UTF-8 LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=de_DE.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C

time zone: Europe/Berlin tzcode source: system (glibc)

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] ggplot2_3.5.1 patchwork_1.2.0 Seurat_5.1.0 SeuratObject_5.0.2 sp_2.1-4

loaded via a namespace (and not attached): [1] deldir_2.0-4 pbapply_1.7-2 gridExtra_2.3 rlang_1.1.3 magrittr_2.0.3 RcppAnnoy_0.0.22 matrixStats_1.3.0
[8] ggridges_0.5.6 compiler_4.4.0 spatstat.geom_3.2-9 png_0.1-8 vctrs_0.6.5 reshape2_1.4.4 stringr_1.5.1
[15] pkgconfig_2.0.3 fastmap_1.1.1 labeling_0.4.3 utf8_1.2.4 promises_1.3.0 purrr_1.0.2 jsonlite_1.8.8
[22] goftest_1.2-3 later_1.3.2 spatstat.utils_3.0-4 irlba_2.3.5.1 parallel_4.4.0 cluster_2.1.6 R6_2.5.1
[29] ica_1.0-3 stringi_1.8.4 RColorBrewer_1.1-3 spatstat.data_3.0-4 reticulate_1.36.1 parallelly_1.37.1 lmtest_0.9-40
[36] scattermore_1.2 Rcpp_1.0.12 tensor_1.5 future.apply_1.11.2 zoo_1.8-12 R.utils_2.12.3 sctransform_0.4.1
[43] httpuv_1.6.15 Matrix_1.6-5 splines_4.4.0 igraph_2.0.3 tidyselect_1.2.1 rstudioapi_0.16.0 abind_1.4-5
[50] spatstat.random_3.2-3 codetools_0.2-19 miniUI_0.1.1.1 spatstat.explore_3.2-7 listenv_0.9.1 lattice_0.22-5 tibble_3.2.1
[57] plyr_1.8.9 withr_3.0.0 shiny_1.8.1.1 ROCR_1.0-11 Rtsne_0.17 future_1.33.2 fastDummies_1.7.3
[64] survival_3.5-8 polyclip_1.10-6 fitdistrplus_1.1-11 pillar_1.9.0 KernSmooth_2.23-22 plotly_4.10.4 generics_0.1.3
[71] RcppHNSW_0.6.0 munsell_0.5.1 scales_1.3.0 globals_0.16.3 xtable_1.8-4 glue_1.7.0 lazyeval_0.2.2
[78] tools_4.4.0 data.table_1.15.4 RSpectra_0.16-1 RANN_2.6.1 leiden_0.4.3.1 dotCall64_1.1-1 cowplot_1.1.3
[85] grid_4.4.0 tidyr_1.3.1 colorspace_2.1-0 nlme_3.1-163 cli_3.6.2 spatstat.sparse_3.0-3 spam_2.10-0
[92] fansi_1.0.6 viridisLite_0.4.2 dplyr_1.1.4 uwot_0.2.2 gtable_0.3.5 R.methodsS3_1.8.2 digest_0.6.35
[99] progressr_0.14.0 ggrepel_0.9.5 farver_2.1.1 htmlwidgets_1.6.4 R.oo_1.26.0 htmltools_0.5.8.1 lifecycle_1.0.4
[106] httr_1.4.7 mime_0.12 MASS_7.3-60.0.1

longmanz commented 6 months ago

Hi, I have rerun your commands but could not replicate the error. The PCA map from my end looks the same as the one from the vignette. Could you confirm that your raw data was downloaded from this link (https://cf.10xgenomics.com/samples/cell/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz)?

In addition, if you directly load the pbmc3k from the SeuratData package, do you also see the same error?

milani31 commented 6 months ago

Hi longmanz, thanks for your reply. Yes, I get the same error in both cases. I also tried with 10X data from a publication I would like to reanalyze and the same problem always seems to occur.

PonGanish commented 5 months ago

Hi,

I also face a similar issue when using the RunPCA function. It returns PCs calculated based on the same gene sets as shown below:

pbmc <- RunPCA(pbmc, verbose = TRUE)

PC_ 1
Positive:  CST3, FTL, BRK1, HNRNPA0, RPL23A, PPDPF, PCBP1, PTMA, FTH1, LYZ 
           RPLP1, MIF, LDHA, RPS24, SLC25A5, ANXA6, ABRACL, TMSB4X, S100A9, HCST 
           RNH1, PTPRC, GNG5, ACAP1, ISCU, HCLS1, CYBA, RPL10A, FXYD5, TIMP1 
Negative:  RPSAP58, CCNI, CNBP, GMFG, TSC22D3, LY6E, ANAPC16, YBX1, CD99, EEF1B2 
           ATP5B, DRAP1, SRGN, ALDOA, ISG15, GPSM3, PSME1, TOMM7, BTF3, ISG20 
           BTG1, RPSA, NBEAL1, HSPA8, SRSF5, GIMAP7, UBL5, PARK7, MT2A, SH3BGRL3

PC_ 2
Positive:  FTL, BRK1, CST3, HNRNPA0, PCBP1, RPL23A, PPDPF, PTMA, FTH1, LYZ 
           MIF, LDHA, RPLP1, SLC25A5, RPS24, ANXA6, TMSB4X, HCST, ABRACL, S100A9 
           RNH1, PTPRC, GNG5, CYBA, ACAP1, HCLS1, ISCU, TIMP1, RPL10A, FXYD5 
Negative:  RPSAP58, CCNI, CNBP, GMFG, TSC22D3, LY6E, ANAPC16, YBX1, CD99, ATP5B 
           EEF1B2, DRAP1, ALDOA, SRGN, ISG15, GPSM3, BTF3, TOMM7, PSME1, ISG20 
           RPSA, BTG1, HSPA8, SRSF5, NBEAL1, GIMAP7, PARK7, UBL5, MT2A, SH3BGRL3

PC_ 3
Positive:  CST3, BRK1, FTL, HNRNPA0, PPDPF, RPL23A, PCBP1, PTMA, FTH1, LYZ 
           RPLP1, MIF, LDHA, RPS24, ANXA6, SLC25A5, ABRACL, S100A9, TMSB4X, HCST 
           RNH1, GNG5, PTPRC, ACAP1, ISCU, HCLS1, CYBA, RPL10A, S100A10, FXYD5 
Negative:  RPSAP58, CCNI, CNBP, TSC22D3, GMFG, LY6E, ANAPC16, YBX1, EEF1B2, CD99 
           ATP5B, DRAP1, ISG15, SRGN, ALDOA, GPSM3, PSME1, TOMM7, BTG1, BTF3 
           ISG20, NBEAL1, RPSA, HSPA8, GIMAP7, SRSF5, UBL5, MT2A, PARK7, U2AF1

PC_ 4
Positive:  CST3, FTL, BRK1, HNRNPA0, RPL23A, PPDPF, PCBP1, PTMA, FTH1, LYZ 
           RPLP1, MIF, LDHA, RPS24, SLC25A5, ANXA6, ABRACL, TMSB4X, S100A9, HCST 
           RNH1, GNG5, PTPRC, ACAP1, ISCU, HCLS1, CYBA, RPL10A, FXYD5, TIMP1 
Negative:  RPSAP58, CCNI, CNBP, GMFG, TSC22D3, LY6E, ANAPC16, YBX1, CD99, EEF1B2 
           ATP5B, DRAP1, SRGN, ALDOA, ISG15, GPSM3, PSME1, TOMM7, BTF3, ISG20 
           BTG1, RPSA, NBEAL1, HSPA8, SRSF5, GIMAP7, UBL5, PARK7, MT2A, SH3BGRL3

PC_ 5
Positive:  FTL, BRK1, CST3, HNRNPA0, PCBP1, RPL23A, PPDPF, PTMA, FTH1, LYZ 
           MIF, RPLP1, LDHA, SLC25A5, RPS24, ANXA6, TMSB4X, ABRACL, HCST, S100A9 
           RNH1, PTPRC, GNG5, ACAP1, CYBA, HCLS1, ISCU, RPL10A, TIMP1, FXYD5 
Negative:  RPSAP58, CCNI, CNBP, GMFG, TSC22D3, LY6E, ANAPC16, YBX1, CD99, EEF1B2 
           ATP5B, DRAP1, ALDOA, SRGN, ISG15, GPSM3, TOMM7, PSME1, BTF3, ISG20 
           RPSA, BTG1, HSPA8, SRSF5, NBEAL1, GIMAP7, UBL5, PARK7, MT2A, SH3BGRL3

After this, when I run the UMAP function, I get the error:

Error in x2set(Xsub, n_neighbors, metric, nn_method = nn_sub, n_trees, : Non-finite entries in the input matrix

tried reinstalling R and RStudio, and also tried with different datasets, but still encounter the same issue.

SESSION INFO:
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RColorBrewer_1.1-3 patchwork_1.2.0    dplyr_1.1.4        sctransform_0.4.1  ggplot2_3.5.1      Seurat_5.1.0       SeuratObject_5.0.2 sp_2.1-4          

loaded via a namespace (and not attached):
  [1] jsonlite_1.8.8              magrittr_2.0.3              spatstat.utils_3.0-5        farver_2.1.2                zlibbioc_1.50.0            
  [6] vctrs_0.6.5                 ROCR_1.0-11                 DelayedMatrixStats_1.26.0   memoise_2.0.1               spatstat.explore_3.2-7     
 [11] S4Arrays_1.4.1              htmltools_0.5.8.1           SparseArray_1.4.8           parallelly_1.37.1           KernSmooth_2.23-24         
 [16] htmlwidgets_1.6.4           ica_1.0-3                   plyr_1.8.9                  plotly_4.10.4               zoo_1.8-12                 
 [21] cachem_1.1.0                igraph_2.0.3                mime_0.12                   lifecycle_1.0.4             pkgconfig_2.0.3            
 [26] Matrix_1.6-5                R6_2.5.1                    fastmap_1.2.0               MatrixGenerics_1.16.0       GenomeInfoDbData_1.2.12    
 [31] fitdistrplus_1.1-11         future_1.33.2               shiny_1.8.1.1               digest_0.6.35               colorspace_2.1-0           
 [36] AnnotationDbi_1.66.0        S4Vectors_0.42.0            tensor_1.5                  RSpectra_0.16-1             irlba_2.3.5.1              
 [41] GenomicRanges_1.56.1        RSQLite_2.3.7               labeling_0.4.3              progressr_0.14.0            fansi_1.0.6                
 [46] spatstat.sparse_3.0-3       httr_1.4.7                  polyclip_1.10-6             abind_1.4-5                 compiler_4.4.1             
 [51] bit64_4.0.5                 withr_3.0.0                 DBI_1.2.3                   fastDummies_1.7.3           R.utils_2.12.3             
 [56] MASS_7.3-60                 DelayedArray_0.30.1         tools_4.4.1                 lmtest_0.9-40               httpuv_1.6.15              
 [61] future.apply_1.11.2         goftest_1.2-3               glmGamPoi_1.16.0            R.oo_1.26.0                 glue_1.7.0                 
 [66] nlme_3.1-165                promises_1.3.0              grid_4.4.1                  Rtsne_0.17                  cluster_2.1.6              
 [71] reshape2_1.4.4              generics_0.1.3              gtable_0.3.5                spatstat.data_3.0-4         R.methodsS3_1.8.2          
 [76] tidyr_1.3.1                 data.table_1.15.4           utf8_1.2.4                  XVector_0.44.0              BiocGenerics_0.50.0        
 [81] spatstat.geom_3.2-9         RcppAnnoy_0.0.22            ggrepel_0.9.5               RANN_2.6.1                  pillar_1.9.0               
 [86] stringr_1.5.1               spam_2.10-0                 RcppHNSW_0.6.0              later_1.3.2                 splines_4.4.1              
 [91] lattice_0.22-5              survival_3.7-0              bit_4.0.5                   deldir_2.0-4                tidyselect_1.2.1           
 [96] Biostrings_2.72.1           miniUI_0.1.1.1              pbapply_1.7-2               gridExtra_2.3               IRanges_2.38.0             
[101] SummarizedExperiment_1.34.0 scattermore_1.2             stats4_4.4.1                Biobase_2.64.0              matrixStats_1.3.0          
[106] stringi_1.8.4               UCSC.utils_1.0.0            lazyeval_0.2.2              codetools_0.2-19            tibble_3.2.1               
[111] cli_3.6.2                   uwot_0.2.2                  xtable_1.8-4                reticulate_1.37.0           munsell_0.5.1              
[116] Rcpp_1.0.12                 GenomeInfoDb_1.40.1         globals_0.16.3              spatstat.random_3.2-3       png_0.1-8                  
[121] parallel_4.4.1              blob_1.2.4                  dotCall64_1.1-1             sparseMatrixStats_1.16.0    listenv_0.9.1              
[126] viridisLite_0.4.2           scales_1.3.0                ggridges_0.5.6              leiden_0.4.3.1              purrr_1.0.2                
[131] crayon_1.5.2                rlang_1.1.4                 cowplot_1.1.3               KEGGREST_1.44.0

Could the authors please clarify this as soon as possible?

milani31 commented 2 months ago

Hi, turns out that the system libraries blas und lapack were causing problems for some R packages on my linux computer, including Seurat and DESeq2. In case some of you are facing the same problem as described above, get in touch with the IT to look into it and that will hopefully save you a lot of time and energy.