Open gherrgo opened 2 months ago
I should add, I attempted the example workflow for merging data and I think I may have isolated the issue. When running RunSVD(combined) I receive the error: Warning in irlba(A = t(x = object), nv = n, work = irlba.work, tol = tol) : did not converge--results might be invalid!; try increasing work or maxit Scaling cell embeddings
And then receive all NA's for cell embeddings across all of the example cells:
combined@reductions$lsi@cell.embeddings LSI_1 LSI_2 LSI_3 LSI_4 LSI_5 LSI_6 LSI_7 LSI_8 LSI_9 LSI_10 LSI_11 LSI_12 LSI_13 500_AAACTGCAGACTCGGA-1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 500_AAAGATGCACCTATTT-1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 500_AAAGATGCAGATACAA-1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Could this be an error forced by irlba ? How should I circumvent? Thanks.
Hi guys, looking for some help on an issue I cannot seem to solve. I have been following the integration pipeline (https://stuartlab.org/signac/articles/merging) for a set of 19 samples, and am having some issues when it comes to processing data following a merge. My individual sample pipeline looks something like this:
Reading in peaks
H9A <- Read10X_h5("09A_counts/outs/filtered_peak_bc_matrix.h5", use.names = T) peaks_H9A <- read.table( file = "09A_counts/outs/peaks.bed", col.names = c("chr", "start", "end")) gr.H9A <- makeGRangesFromDataFrame(peaks_H9A)
(defining combined_peaks using all objects)
Loading metadata
md.H9A <- read.table( file = "09A_counts/outs/singlecell.csv", stringsAsFactors = FALSE, sep = ",", header = TRUE, row.names = 1 )[-1, ] # remove the first row
Initial filtering of low-count cells
plot(md.H9A$passed_filters) md.H9A <- md.H9A[md.H9A$passed_filters > 100, ]
create fragment objects
frags.H9A <- CreateFragmentObject( path = "09A_counts/outs/fragments.tsv.gz", cells = rownames(md.H9A) )
Quantifying peaks
H9A.counts <- FeatureMatrix( fragments = frags.H9A, features = combined.peaks, cells = rownames(md.H9A) )
Creating objects
H9A_assay <- CreateChromatinAssay(H9A.counts, fragments = frags.H9A) s.H9A <- CreateSeuratObject(H9A_assay, assay = "ATAC", meta.data=md.H9A, project="H9A") ncol(s.H9A) #2119 Annotation(s.H9A) <- annotation
Quality control - per sample
grange <- StringToGRanges(rownames(s.H9A), sep = c(":", "-")) grange.use <- seqnames(grange) %in% standardChromosomes(grange) s.H9A <- s.H9A[as.vector(grange.use), ] s.H9A <- NucleosomeSignal(object=s.H9A) #2119 cell barcodes s.H9A <- TSSEnrichment(object=s.H9A, fast=FALSE)
s.H9A$pct_reads_in_peaks <- s.H9A@meta.data$peak_region_fragments / s.H9A@meta.data$passed_filters * 100 s.H9A$blacklist_ratio <- s.H9A@meta.data$blacklist_region_fragments / s.H9A@meta.data$peak_region_fragments s.H9A$high.tss <- ifelse(s.H9A$TSS.enrichment > 1, 'High', 'Low') s.H9A$nucleosome_group <- ifelse(s.H9A$nucleosome_signal > 2, 'NS > 2', 'NS < 2') s.H9A <- subset( x = s.H9A, subset = peak_region_fragments < 250 & pct_reads_in_peaks > 0.1 & pct_reads_in_peaks < 2 & nucleosome_signal < 1.5 & nucleosome_signal > 0.1 & TSS.enrichment < 5 ) ncol(s.H9A) #1638 cells
Then merging all samples into an object called total_ATAC and running: total_ATAC <- RunTFIDF(total_ATAC) total_ATAC <- FindTopFeatures(total_ATAC, min.cutoff = 20) total_ATAC[['ATAC']] total_ATAC <- RunSVD(total_ATAC, n = 50) total_ATAC <- RunUMAP(total_ATAC, dims = 2:50, reduction = 'lsi')
However, when I run 'RunUMAP' I receive the error: 14:33:10 UMAP embedding parameters a = 0.9922 b = 1.112 Error in checkna(X) : Missing values found in 'X'
Which is due to the cell embeddings from LSI being all NA valued:
This issue has only persisted across merged data, as I can run the SVD and UMAP on individual samples with no problem. I cannot seem to figure out how to troubleshoot this, tried many different things, all to no solution. If you have any ideas, it would be much appreciated. Thanks a bunch.
Here's my sessioninfo(): R version 4.4.1 (2024-06-14) Platform: x86_64-pc-linux-gnu Running under: Ubuntu 22.04.5 LTS
Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
Random number generation: RNG: L'Ecuyer-CMRG Normal: Inversion Sample: Rejection
locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: America/New_York tzcode source: system (glibc)
attached base packages: [1] grid stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages: [1] EnsDb.Hsapiens.v86_2.99.0 ensembldb_2.28.1 AnnotationFilter_1.28.0
[4] GenomicFeatures_1.56.0 SingleCellExperiment_1.26.0 rhdf5_2.48.0
[7] Rcpp_1.0.13 data.table_1.16.0 plyr_1.8.9
[10] magrittr_2.0.3 gtable_0.3.5 gtools_3.9.5
[13] gridExtra_2.3 ArchR_1.0.2 Signac_1.14.0
[16] SCP_0.5.6 stringr_1.5.1 reprex_2.1.1
[19] cowplot_1.1.3 SCINA_1.2.0 gplots_3.1.3.1
[22] MASS_7.3-61 infercnv_1.20.0 patchwork_1.3.0
[25] Seurat_5.1.0 SeuratObject_5.0.2 sp_2.1-4
[28] Matrix_1.6-5 gprofiler2_0.2.3 scales_1.3.0
[31] NMF_0.28 cluster_2.1.6 rngtools_1.5.2
[34] registry_0.5-1 DOSE_3.28.2 org.Hs.eg.db_3.18.0
[37] AnnotationDbi_1.66.0 clusterProfiler_4.10.1 readxl_1.4.3
[40] rtracklayer_1.64.0 ggpubr_0.6.0 reshape2_1.4.4
[43] tidyr_1.3.1 dplyr_1.1.4 DESeq2_1.44.0
[46] SummarizedExperiment_1.34.0 Biobase_2.64.0 MatrixGenerics_1.16.0
[49] matrixStats_1.4.1 GenomicRanges_1.56.1 GenomeInfoDb_1.40.1
[52] IRanges_2.38.1 S4Vectors_0.42.1 BiocGenerics_0.50.0
[55] tibble_3.2.1 DescTools_0.99.56 caret_6.0-94
[58] lattice_0.22-6 ggrepel_0.9.6 ggplot2_3.5.1
[61] pROC_1.18.5 doMC_1.3.8 iterators_1.0.14
[64] foreach_1.5.2
loaded via a namespace (and not attached): [1] igraph_2.0.3 ica_1.0-3 plotly_4.10.4 zlibbioc_1.50.0
[5] tidyselect_1.2.1 bit_4.0.5 doParallel_1.0.17 clue_0.3-65
[9] rjson_0.2.23 blob_1.2.4 S4Arrays_1.4.1 png_0.1-8
[13] cli_3.6.3 ggplotify_0.1.2 ProtGenerics_1.36.0 goftest_1.2-3
[17] BiocIO_1.14.0 SCpubr_2.0.2 purrr_1.0.2 uwot_0.2.2
[21] shadowtext_0.1.4 curl_5.2.2 mime_0.12 tidytree_0.4.6
[25] leiden_0.4.3.1 coin_1.4-3 ComplexHeatmap_2.18.0 stringi_1.8.4
[29] backports_1.5.0 rjags_4-16 slingshot_2.10.0 parallelDist_0.2.6
[33] XML_3.99-0.17 Exact_3.3 lubridate_1.9.3 httpuv_1.6.15
[37] rappdirs_0.3.3 splines_4.4.1 RcppRoll_0.3.1 prodlim_2024.06.25
[41] ggraph_2.2.1 sctransform_0.4.1 rootSolve_1.8.2.4 DBI_1.2.3
[45] HDF5Array_1.32.1 withr_3.0.1 class_7.3-22 enrichplot_1.22.0
[49] lmtest_0.9-40 ggnewscale_0.5.0 tidygraph_1.3.1 formatR_1.14
[53] BiocManager_1.30.25 htmlwidgets_1.6.4 fs_1.6.4 biomaRt_2.58.2
[57] princurve_2.1.6 labeling_0.4.3 SparseArray_1.4.8 cellranger_1.1.0
[61] lmom_3.0 reticulate_1.39.0 zoo_1.8-12 XVector_0.44.0
[65] UCSC.utils_1.0.0 timechange_0.3.0 fansi_1.0.6 caTools_1.18.3
[69] timeDate_4032.109 ggtree_3.10.1 R.oo_1.26.0 RSpectra_0.16-2
[73] irlba_2.3.5.1 fastDummies_1.7.4 gridGraphics_0.5-1 lazyeval_0.2.2
[77] yaml_2.3.10 phyclust_0.1-34 survival_3.7-0 scattermore_1.2
[81] crayon_1.5.3 RcppAnnoy_0.0.22 RColorBrewer_1.1-3 progressr_0.14.0
[85] tweenr_2.0.3 later_1.3.2 ggridges_0.5.6 codetools_0.2-20
[89] GlobalOptions_0.1.2 KEGGREST_1.44.1 Rtsne_0.17 shape_1.4.6.1
[93] limma_3.60.4 Rsamtools_2.20.0 filelock_1.0.3 pkgconfig_2.0.3
[97] xml2_1.3.6 spatstat.univar_3.0-1 GenomicAlignments_1.40.0 aplot_0.2.3
[101] spatstat.sparse_3.1-0 ape_5.8 viridisLite_0.4.2 gridBase_0.4-7
[105] xtable_1.8-4 car_3.1-2 fastcluster_1.2.6 httr_1.4.7
[109] tools_4.4.1 globals_0.16.3 hardhat_1.4.0 broom_1.0.6
[113] nlme_3.1-166 futile.logger_1.4.3 lambda.r_1.2.4 HDO.db_0.99.1
[117] dbplyr_2.5.0 assertthat_0.2.1 digest_0.6.37 farver_2.1.2
[121] TrajectoryUtils_1.10.1 ModelMetrics_1.2.2.2 yulab.utils_0.1.7 viridis_0.6.5
[125] rpart_4.1.23 glue_1.7.0 cachem_1.1.0 BiocFileCache_2.10.2
[129] polyclip_1.10-7 proxyC_0.4.1 generics_0.1.3 Biostrings_2.72.1
[133] mvtnorm_1.3-1 parallelly_1.38.0 statmod_1.5.0 R.cache_0.16.0
[137] RcppHNSW_0.6.0 carData_3.0-5 pbapply_1.7-2 spam_2.10-0
[141] gson_0.1.0 utf8_1.2.4 gower_1.0.1 SeuratWrappers_0.3.0
[145] graphlayouts_1.1.1 ggsignif_0.6.4 shiny_1.9.1 lava_1.8.0
[149] GenomeInfoDbData_1.2.12 R.utils_2.12.3 rhdf5filters_1.16.0 RCurl_1.98-1.16
[153] memoise_2.0.1 R.methodsS3_1.8.2 gld_2.6.6 future_1.34.0
[157] RANN_2.6.2 Cairo_1.6-2 spatstat.data_3.1-2 rstudioapi_0.16.0
[161] spatstat.utils_3.1-0 hms_1.1.3 fitdistrplus_1.2-1 munsell_0.5.1
[165] colorspace_2.1-1 rlang_1.1.4 ipred_0.9-15 dotCall64_1.1-1
[169] ggforce_0.4.2 circlize_0.4.16 coda_0.19-4.1 e1071_1.7-16
[173] TH.data_1.1-2 remotes_2.5.0 recipes_1.1.0 modeltools_0.2-23
[177] abind_1.4-8 GOSemSim_2.28.1 libcoin_1.0-10 treeio_1.26.0
[181] Rhdf5lib_1.26.0 futile.options_1.0.1 bitops_1.0-8 promises_1.3.0
[185] scatterpie_0.2.4 RSQLite_2.3.7 qvalue_2.34.0 sandwich_3.1-1
[189] fgsea_1.28.0 DelayedArray_0.30.1 proxy_0.4-27 GO.db_3.18.0
[193] compiler_4.4.1 forcats_1.0.0 prettyunits_1.2.0 boot_1.3-31
[197] listenv_0.9.1 edgeR_4.2.1 tensor_1.5 progress_1.2.3
[201] BiocParallel_1.38.0 spatstat.random_3.3-2 R6_2.5.1 fastmap_1.2.0
[205] multcomp_1.4-26 fastmatch_1.1-4 rstatix_0.7.2 ROCR_1.0-11
[209] rsvd_1.0.5 nnet_7.3-19 KernSmooth_2.23-24 miniUI_0.1.1.1
[213] deldir_2.0-4 htmltools_0.5.8.1 RcppParallel_5.1.9 bit64_4.0.5
[217] spatstat.explore_3.3-2 lifecycle_1.0.4 restfulr_0.0.15 vctrs_0.6.5
[221] spatstat.geom_3.3-3 ggfun_0.1.6 future.apply_1.11.2 pillar_1.9.0
[225] locfit_1.5-9.10 jsonlite_1.8.8 expm_1.0-0 argparse_2.2.3
[229] GetoptLong_1.0.5