zhanghao-njmu / SCP

An end-to-end Single-Cell Pipeline designed to facilitate comprehensive analysis and exploration of single-cell data.
https://zhanghao-njmu.github.io/SCP/
GNU General Public License v3.0
357 stars 81 forks source link

BiocParallel errors while running RunDynamicFeatures #119

Closed bio-visualisation closed 1 year ago

bio-visualisation commented 1 year ago

I am facing BiocParallel errors while running RunDynamicFeatures. I am using ubuntu HPC server. Do you have any clue why is it happening? Thanks cortex <- RunDynamicFeatures(srt = cortex, lineages = c("Lineage3", "Lineage7"), n_candidates = 200)

Calculating gene variances 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| **| Calculating feature variances of standardized and clipped values 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| **| Calculating gene variances 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| **| Calculating feature variances of standardized and clipped values 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| **| Number of candidate features(union): 228 Calculate dynamic features for Lineage3... |====================================================================================================================| 100%

Calculate dynamic features for Lineage7... |========================================================================================================= | 91%Stop worker failed with the error: reached CPU time limit

Error: BiocParallel errors 4 remote errors, element index: 66, 77, 82, 88 26 unevaluated and other errors first remote error: Error in eigen(crossprod(Rm %*% B)/b$sig2, symmetric = TRUE, only.values = TRUE): infinite or missing values in 'x' Timing stopped at: 455.2 2414 47.66

sessionInfo() R version 4.2.0 (2022-04-22) Platform: x86_64-conda-linux-gnu (64-bit) Running under: Ubuntu 22.04.2 LTS

Matrix products: default BLAS/LAPACK: /home/basu/miniconda3/envs/env_R/lib/libopenblasp-r0.3.21.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] SCP_0.4.2 lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0 dplyr_1.1.0 purrr_1.0.1 readr_2.1.4
[8] tidyr_1.3.0 tibble_3.2.0 ggplot2_3.4.1 tidyverse_2.0.0 SeuratObject_4.1.3 Seurat_4.3.0

loaded via a namespace (and not attached): [1] utf8_1.2.3 spatstat.explore_3.1-0 reticulate_1.28 R.utils_2.12.2
[5] tidyselect_1.2.0 RSQLite_2.3.0 AnnotationDbi_1.60.1 htmlwidgets_1.6.1
[9] grid_4.2.0 BiocParallel_1.32.5 Rtsne_0.16 scatterpie_0.1.8
[13] munsell_0.5.0 codetools_0.2-18 ica_1.0-3 future_1.32.0
[17] miniUI_0.1.1.1 withr_2.5.0 spatstat.random_3.1-4 colorspace_2.1-0
[21] GOSemSim_2.24.0 progressr_0.13.0 Biobase_2.58.0 filelock_1.0.2
[25] rstudioapi_0.14 SingleCellExperiment_1.20.0 stats4_4.2.0 ROCR_1.0-11
[29] tensor_1.5 DOSE_3.24.2 listenv_0.9.0 MatrixGenerics_1.10.0
[33] GenomeInfoDbData_1.2.9 polyclip_1.10-4 farver_2.1.1 bit64_4.0.5
[37] rprojroot_2.0.3 downloader_0.4 treeio_1.23.1 parallelly_1.34.0
[41] vctrs_0.5.2 generics_0.1.3 gson_0.1.0 timechange_0.2.0
[45] BiocFileCache_2.6.1 R6_2.5.1 doParallel_1.0.17 GenomeInfoDb_1.34.9
[49] graphlayouts_0.8.4 clue_0.3-64 DelayedArray_0.24.0 gridGraphics_0.5-1
[53] fgsea_1.24.0 bitops_1.0-7 spatstat.utils_3.0-2 cachem_1.0.7
[57] promises_1.2.0.1 scales_1.2.1 ggraph_2.1.0 enrichplot_1.18.3
[61] gtable_0.3.1 globals_0.16.2 goftest_1.2-3 tidygraph_1.2.3
[65] rlang_1.0.6 RcppRoll_0.3.0 GlobalOptions_0.1.2 splines_4.2.0
[69] lazyeval_0.2.2 princurve_2.1.6 spatstat.geom_3.1-0 reshape2_1.4.4
[73] abind_1.4-5 httpuv_1.6.9 qvalue_2.30.0 clusterProfiler_4.6.2
[77] tools_4.2.0 ggplotify_0.1.0 ellipsis_0.3.2 RColorBrewer_1.1-3
[81] BiocGenerics_0.44.0 ggridges_0.5.4 Rcpp_1.0.10 plyr_1.8.8
[85] progress_1.2.2 zlibbioc_1.44.0 RCurl_1.98-1.10 TrajectoryUtils_1.6.0
[89] prettyunits_1.1.1 deldir_1.0-6 viridis_0.6.2 pbapply_1.7-0
[93] GetoptLong_1.0.5 cowplot_1.1.1 S4Vectors_0.36.2 zoo_1.8-11
[97] SummarizedExperiment_1.28.0 ggrepel_0.9.3 cluster_2.1.3 here_1.0.1
[101] magrittr_2.0.3 data.table_1.14.8 scattermore_0.8 circlize_0.4.15
[105] lmtest_0.9-40 RANN_2.6.1 parallelDist_0.2.6 ggnewscale_0.4.8
[109] fitdistrplus_1.1-8 Signac_1.9.0 R.cache_0.16.0 matrixStats_0.63.0
[113] hms_1.1.2 patchwork_1.1.2 mime_0.12 xtable_1.8-4
[117] HDO.db_0.99.1 XML_3.99-0.13 IRanges_2.32.0 gridExtra_2.3
[121] shape_1.4.6 compiler_4.2.0 biomaRt_2.54.0 shadowtext_0.1.2
[125] KernSmooth_2.23-20 crayon_1.5.2 R.oo_1.25.0 htmltools_0.5.4
[129] mgcv_1.8-42 ggfun_0.0.9 later_1.3.0 tzdb_0.3.0
[133] aplot_0.1.10 RcppParallel_5.1.7 DBI_1.1.3 tweenr_2.0.2
[137] slingshot_2.6.0 dbplyr_2.3.1 ComplexHeatmap_2.15.1 MASS_7.3-57
[141] rappdirs_0.3.3 Matrix_1.5-3 cli_3.6.0 R.methodsS3_1.8.2
[145] parallel_4.2.0 igraph_1.4.1 GenomicRanges_1.50.2 pkgconfig_2.0.3
[149] sp_1.6-0 plotly_4.10.1 spatstat.sparse_3.0-1 xml2_1.3.3
[153] foreach_1.5.2 ggtree_3.7.1.003 XVector_0.38.0 yulab.utils_0.0.6
[157] digest_0.6.31 sctransform_0.3.5 RcppAnnoy_0.0.20 spatstat.data_3.0-1
[161] Biostrings_2.66.0 leiden_0.4.3 fastmatch_1.1-3 tidytree_0.4.2
[165] uwot_0.1.14 curl_5.0.0 shiny_1.7.4 Rsamtools_2.14.0
[169] rjson_0.2.21 lifecycle_1.0.3 nlme_3.1-157 jsonlite_1.8.4
[173] viridisLite_0.4.1 fansi_1.0.4 pillar_1.8.1 lattice_0.20-45
[177] KEGGREST_1.38.0 fastmap_1.1.1 httr_1.4.5 survival_3.3-1
[181] GO.db_3.16.0 glue_1.6.2 png_0.1-8 iterators_1.0.14
[185] bit_4.0.5 ggforce_0.4.1 stringi_1.7.12 blob_1.2.3
[189] memoise_2.0.1 ape_5.7-1 irlba_2.3.5.1 future.apply_1.10.0

zhanghao-njmu commented 1 year ago

I'm not sure if the problem is caused by "CPU time limit". If it is, you can try specifying BPPARAM = BiocParallel::MulticoreParam(workers = 10) to adjust the number of workers. Hope this helps!

bio-visualisation commented 1 year ago

The same error pops up. failed with the error: reached CPU time limit

zhanghao-njmu commented 1 year ago

There is no problem when calculating the dynamic features of Lineage3, but the issue arises during the 91% stage of calculating the dynamic features of Lineage7. If it's not due to too many threads, it could be because the original counts contain some Inf or missing values or other reasons. My suggestion is that you can try to calculate the dynamic features of Lineage3 and Lineage7 separately to see if only Lineage7 has this issue. In addition, you can specify n_candidates=100 or feature names. For example, if calculating 100 candidate features or manually specified features works fine, then it is likely an issue with the counts of certain features.

Most importantly, if BiocParallel encounters "unevaluated" errors, you must use .rs.restartR() or manually restart the Rsession; otherwise, the error will persist regardless of whether there is a problem with the code.

bio-visualisation commented 1 year ago

I did not find any problem when I ran n_candidates=50. How do I get rid of those missing values?

zhanghao-njmu commented 1 year ago

You can use the following code to count and find infinite or missing values in the counts/data slot:

Matrix::rowSums(is.infinite(srt@assays$RNA@counts))
Matrix::rowSums(is.na(srt@assays$RNA@counts))

If infinite/missing values are found, you can temporarily remove these features using the subset() function and create a new Seurat object for calculation purposes.

bio-visualisation commented 1 year ago

Thanks.