stuart-lab / signac

R toolkit for the analysis of single-cell chromatin data
https://stuartlab.org/signac/
Other
325 stars 87 forks source link

The TSSenrichment error #786

Closed honghh2018 closed 3 years ago

honghh2018 commented 3 years ago

Hi @timoast , I was running the ScRNA and the ATAC 10X multiomic data in below step showing me below error. The preprocess step was: annotation <- GetGRangesFromEnsDb(ensdb = EnsDb.Mmusculus.v79) genome(annotation) <- "mm9" seqlevelsStyle(annotation) <- "UCSC"

obj <- TSSEnrichment(obj) Extracting TSS positions Extracting fragments at TSSs | | 0 % ~calculating Error in base::rowSums(x, na.rm = na.rm, dims = dims, ...) : 'x' must be an array of at least two dimensions The version:

packageVersion('Signac') [1] '1.3.0 sessionInfo() R version 4.0.2 (2020-06-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

Matrix products: default BLAS: /share/nas1/Data/software/R/R-4.0.2/lib64/R/lib/libRblas.so LAPACK: /share/nas1/Data/software/R/R-4.0.2/lib64/R/lib/libRlapack.so

locale: [1] LC_CTYPE=zh_CN.UTF-8 LC_NUMERIC=C
[3] LC_TIME=zh_CN.UTF-8 LC_COLLATE=zh_CN.UTF-8
[5] LC_MONETARY=zh_CN.UTF-8 LC_MESSAGES=zh_CN.UTF-8
[7] LC_PAPER=zh_CN.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] data.table_1.13.6 plyr_1.8.6
[3] dplyr_1.0.4 EnsDb.Mmusculus.v79_2.99.0
[5] BSgenome.Mmusculus.UCSC.mm10_1.4.0 BSgenome.Hsapiens.UCSC.hg19_1.4.3 [7] BSgenome.Hsapiens.UCSC.hg38_1.4.3 BSgenome_1.58.0
[9] rtracklayer_1.50.0 Biostrings_2.58.0
[11] XVector_0.30.0 cowplot_1.1.1
[13] ggplot2_3.3.3 EnsDb.Hsapiens.v86_2.99.0
[15] ensembldb_2.14.0 AnnotationFilter_1.14.0
[17] GenomicFeatures_1.42.1 AnnotationDbi_1.54.0
[19] Biobase_2.50.0 GenomicRanges_1.42.0
[21] GenomeInfoDb_1.26.2 IRanges_2.24.1
[23] S4Vectors_0.28.1 BiocGenerics_0.36.0
[25] Signac_1.3.0 SeuratObject_4.0.0
[27] Seurat_4.0.2

loaded via a namespace (and not attached): [1] reticulate_1.18 tidyselect_1.1.0
[3] RSQLite_2.2.3 htmlwidgets_1.5.3
[5] grid_4.0.2 docopt_0.7.1
[7] BiocParallel_1.24.1 Rtsne_0.15
[9] munsell_0.5.0 codetools_0.2-16
[11] ica_1.0-2 future_1.21.0
[13] miniUI_0.1.1.1 withr_2.4.1
[15] colorspace_2.0-0 knitr_1.31
[17] rstudioapi_0.13 ROCR_1.0-11
[19] tensor_1.5 listenv_0.8.0
[21] MatrixGenerics_1.2.1 slam_0.1-48
[23] GenomeInfoDbData_1.2.4 polyclip_1.10-0
[25] bit64_4.0.5 farver_2.0.3
[27] parallelly_1.23.0 vctrs_0.3.6
[29] generics_0.1.0 xfun_0.21
[31] biovizBase_1.38.0 BiocFileCache_1.14.0
[33] lsa_0.73.2 ggseqlogo_0.1
[35] R6_2.5.0 bitops_1.0-6
[37] spatstat.utils_2.1-0 cachem_1.0.3
[39] DelayedArray_0.16.1 assertthat_0.2.1
[41] promises_1.2.0.1 scales_1.1.1
[43] nnet_7.3-14 gtable_0.3.0
[45] globals_0.14.0 goftest_1.2-2
[47] rlang_0.4.10 RcppRoll_0.3.0
[49] splines_4.0.2 lazyeval_0.2.2
[51] dichromat_2.0-0 checkmate_2.0.0
[53] spatstat.geom_2.1-0 reshape2_1.4.4
[55] abind_1.4-5 backports_1.2.1
[57] httpuv_1.5.5 Hmisc_4.4-2
[59] tools_4.0.2 ellipsis_0.3.1
[61] spatstat.core_2.1-2 RColorBrewer_1.1-2
[63] ggridges_0.5.3 Rcpp_1.0.6
[65] base64enc_0.1-3 progress_1.2.2
[67] zlibbioc_1.36.0 purrr_0.3.4
[69] RCurl_1.98-1.2 prettyunits_1.1.1
[71] rpart_4.1-15 openssl_1.4.3
[73] deldir_0.2-9 pbapply_1.4-3
[75] zoo_1.8-8 SummarizedExperiment_1.20.0 [77] ggrepel_0.9.1 cluster_2.1.0
[79] magrittr_2.0.1 scattermore_0.7
[81] lmtest_0.9-38 RANN_2.6.1
[83] SnowballC_0.7.0 ProtGenerics_1.22.0
[85] fitdistrplus_1.1-3 matrixStats_0.58.0
[87] hms_1.0.0 patchwork_1.1.1
[89] mime_0.9 xtable_1.8-4
[91] XML_3.99-0.5 jpeg_0.1-8.1
[93] sparsesvd_0.2 gridExtra_2.3
[95] compiler_4.0.2 biomaRt_2.46.3
[97] tibble_3.0.6 KernSmooth_2.23-17
[99] crayon_1.4.1 htmltools_0.5.1.1
[101] mgcv_1.8-31 later_1.1.0.1
[103] Formula_1.2-4 tidyr_1.1.2
[105] DBI_1.1.1 tweenr_1.0.1
[107] dbplyr_2.1.0 MASS_7.3-51.6
[109] rappdirs_0.3.3 Matrix_1.2-18
[111] igraph_1.2.6 pkgconfig_2.0.3
[113] GenomicAlignments_1.26.0 foreign_0.8-80
[115] plotly_4.9.3 spatstat.sparse_2.0-0
[117] xml2_1.3.2 VariantAnnotation_1.36.0
[119] stringr_1.4.0 digest_0.6.27
[121] sctransform_0.3.2 RcppAnnoy_0.0.18
[123] spatstat.data_2.1-0 leiden_0.3.7
[125] fastmatch_1.1-0 htmlTable_2.1.0
[127] uwot_0.1.10 curl_4.3
[129] shiny_1.6.0 Rsamtools_2.6.0
[131] lifecycle_0.2.0 nlme_3.1-148
[133] jsonlite_1.7.2 viridisLite_0.3.0
[135] askpass_1.1 pillar_1.4.7
[137] lattice_0.20-41 KEGGREST_1.30.1
[139] fastmap_1.1.0 httr_1.4.2
[141] survival_3.1-12 glue_1.4.2
[143] qlcMatrix_0.9.7 png_0.1-7
[145] bit_4.0.4 ggforce_0.3.2
[147] stringi_1.5.3 blob_1.2.1
[149] latticeExtra_0.6-29 memoise_2.0.0
[151] irlba_2.3.3 future.apply_1.7.0

So how can i fix this issue? Any advice would be appreciated. Best, Hanhuihong

timoast commented 3 years ago

HI @honghh2018, can you show the full code?

honghh2018 commented 3 years ago

Sorry for the less information, the full code display below: library(Seurat) library(Signac) library(ggplot2) library(cowplot) library(BSgenome.Hsapiens.UCSC.hg38) library(BSgenome.Hsapiens.UCSC.hg19) library(BSgenome.Mmusculus.UCSC.mm10) library(EnsDb.Hsapiens.v86) library(EnsDb.Mmusculus.v79)

options(future.globals.maxSize = 10000 * 1024^2) options(bitmapType='cairo') species = 'Mouse'

wd = '/Standard_sc/20210816_f1/Analysis/f1/outs/'

load both modalities

count.data<-Read10X(paste0(wd, '/filtered_feature_bc_matrix')) fragpath <- paste0(wd,"/atac_fragments.tsv.gz") meta_data <- fread(paste0(wd,"/per_barcode_metrics.csv"),header = T) annotation = NULL

if(species == "Mouse"){ annotation <- GetGRangesFromEnsDb(ensdb = EnsDb.Mmusculus.v79) genome(annotation) <- "mm9" #'mm10' that is not work same as 'mm9' seqlevelsStyle(annotation) <- "UCSC" }

obj<-CreateSeuratObject( counts = count.data$Gene Expression, project = 'f1', assay = "RNA" )

obj[["ATAC"]] <- CreateChromatinAssay( counts = count.data$Peaks, sep = c(":", "-"), fragments = fragpath, annotation = annotation )

DefaultAssay(obj) <- "ATAC" obj <- NucleosomeSignal(obj) Found 10119 cell barcodes Done Processing 1 million lines Done Processing 50 million lines> obj <- TSSEnrichment(obj) Extracting TSS positions Extracting fragments at TSSs | | 0 % ~calculating Error in base::rowSums(x, na.rm = na.rm, dims = dims, ...) : 'x' must be an array of at least two dimensions

Hope hearing from you sooner. Best

timoast commented 3 years ago

What genome is your data mapped to?

honghh2018 commented 3 years ago

The 10X multiomic of ScRNA + ATAC's genome named 'refdata-cellranger-arc-mm10-2020-A', the download site showing below: curl -O https://cf.10xgenomics.com/supp/cell-arc/refdata-cellranger-arc-mm10-2020-A-2.0.0.tar.gz Performing the mapping by cellranger-arc-2.0.0 software toolkit.

a little snapshot of gene.gtf file displaying below: image

Best,

timoast commented 3 years ago

Can you try setting genome(annotation) <- "mm10" rather than mm9, since the data is mapped to mm10?

honghh2018 commented 3 years ago

The mm10 still trigger this error, the full code lying below: The loading package same as above. species = "Mouse"

count.data<-Read10X(paste0(wd, '/filtered_feature_bc_matrix')) 10X data contains more than one type and is being returned as a list containing matrices of each type. fragpath <- paste0(wd,"/atac_fragments.tsv.gz") meta_data <- fread(paste0(wd,"/per_barcode_metrics.csv"),header = T)

if(species == "Mouse"){

  • annotation <- GetGRangesFromEnsDb(ensdb = EnsDb.Mmusculus.v79)
  • genome(annotation) <- "mm10"
  • seqlevelsStyle(annotation) <- "UCSC"
  • } obj[["percent.mt"]] <- PercentageFeatureSet(obj, pattern = "^MT-|^mt-|[-\s+]MT-|[-\s+]mt-|Mt-") obj[["ATAC"]] <- CreateChromatinAssay(
  • counts = count.data$Peaks,
  • sep = c(":", "-"),
  • fragments = fragpath,
  • annotation = annotation
  • ) Computing hash Checking for 10119 cell barcodes DefaultAssay(obj) <- "ATAC" obj An object of class Seurat 154959 features across 10119 samples within 2 assays Active assay: ATAC (122674 features, 0 variable features) 1 other assay present: RNA pbmc <- NucleosomeSignal(obj) Found 10119 cell barcodes Done Processing 1 million lines Done Processing 50 million lines> pbmc <- TSSEnrichment(pbmc) Extracting TSS positions Extracting fragments at TSSs | | 0 % ~calculating Error in base::rowSums(x, na.rm = na.rm, dims = dims, ...) : 'x' must be an array of at least two dimensions

Best

honghh2018 commented 3 years ago

Had any idea for this issue? @timoast

timoast commented 3 years ago

Can you show the output of head(Fragments(obj)[[1]])?

honghh2018 commented 3 years ago

The output like this: image

timoast commented 3 years ago

Can you show the output of traceback() run after seeing the error?

honghh2018 commented 3 years ago

Hi @timoast , This is traceback() outcome after showing up the error

obj<- NucleosomeSignal(obj) Found 10119 cell barcodes Done Processing 1 million lines Done Processing 50 million lines obj<- TSSEnrichment(obj) Extracting TSS positions Extracting fragments at TSSs | | 0 % ~calculating Error in base::rowSums(x, na.rm = na.rm, dims = dims, ...) : 'x' must be an array of at least two dimensions

traceback() 10: stop("'x' must be an array of at least two dimensions") 9: base::rowSums(x, na.rm = na.rm, dims = dims, ...) 8: rowSums(x = cuts.center) 7: rowSums(x = cuts.center) 6: extract_tss_counts(cellnames = colnames(x = object), region.centers = centers[[x]], upstream = upstream.flank[[x]], downstream = downstream.flank[[x]], tabix.file = tbx, cell.name.map = cellmap) 5: FUN(X[[i]], ...) 4: lapply(X[Split[[i]]], FUN, ...) 3: mylapply(X = seq_along(along.with = centers), FUN = function(x) { extract_tss_counts(cellnames = colnames(x = object), region.centers = centers[[x]], upstream = upstream.flank[[x]], downstream = downstream.flank[[x]], tabix.file = tbx, cell.name.map = cellmap) }) 2: TSSFast(object = object, assay = assay, tss.positions = tss.positions, process_n = process_n, verbose = verbose) 1: TSSEnrichment(obj)

Best

timoast commented 3 years ago

I made an update which I think should fix this, can you install the latest version from the develop branch (1.3.0.9007, see instruction here: https://satijalab.org/signac/articles/install.html#development-version-1) and see if you still have the same issue?

honghh2018 commented 3 years ago

Hi @timoast , The previous error had solved. However, there were another new issue occurred addtionally, like below posted:

DefaultAssay(obj) <- "ATAC" set.seed(1234) obj <- NucleosomeSignal(obj) Found 10119 cell barcodes Done Processing 1 million lines Done Processing 50 million lines obj <- TSSEnrichment(obj) Extracting TSS positions Extracting fragments at TSSs |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s

Computing TSS enrichment score Error in ecdf(x = object$TSS.enrichment) : 'x' must have 1 or more non-missing values

traceback() 4: stop("'x' must have 1 or more non-missing values") 3: ecdf(x = object$TSS.enrichment) 2: TSSFast(object = object, assay = assay, tss.positions = tss.positions, process_n = process_n, verbose = verbose) 1: TSSEnrichment(obj)

The R sessionInfo()

sessionInfo() R version 4.0.2 (2020-06-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

Matrix products: default BLAS: /share/nas1/Data/software/R/R-4.0.2/lib64/R/lib/libRblas.so LAPACK: /share/nas1/Data/software/R/R-4.0.2/lib64/R/lib/libRlapack.so

locale: [1] LC_CTYPE=zh_CN.UTF-8 LC_NUMERIC=C
[3] LC_TIME=zh_CN.UTF-8 LC_COLLATE=zh_CN.UTF-8
[5] LC_MONETARY=zh_CN.UTF-8 LC_MESSAGES=zh_CN.UTF-8
[7] LC_PAPER=zh_CN.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] data.table_1.13.6 EnsDb.Mmusculus.v79_2.99.0
[3] BSgenome.Mmusculus.UCSC.mm10_1.4.0 BSgenome.Hsapiens.UCSC.hg19_1.4.3 [5] BSgenome.Hsapiens.UCSC.hg38_1.4.3 BSgenome_1.58.0
[7] rtracklayer_1.50.0 Biostrings_2.58.0
[9] XVector_0.30.0 cowplot_1.1.1
[11] ggplot2_3.3.3 EnsDb.Hsapiens.v86_2.99.0
[13] ensembldb_2.14.0 AnnotationFilter_1.14.0
[15] GenomicFeatures_1.42.1 AnnotationDbi_1.54.0
[17] Biobase_2.50.0 GenomicRanges_1.42.0
[19] GenomeInfoDb_1.26.2 IRanges_2.24.1
[21] S4Vectors_0.28.1 BiocGenerics_0.36.0
[23] SeuratObject_4.0.2 Seurat_4.0.4
[25] Signac_1.3.0.9009

loaded via a namespace (and not attached): [1] reticulate_1.18 tidyselect_1.1.0
[3] RSQLite_2.2.3 htmlwidgets_1.5.3
[5] grid_4.0.2 docopt_0.7.1
[7] BiocParallel_1.24.1 Rtsne_0.15
[9] munsell_0.5.0 codetools_0.2-16
[11] ica_1.0-2 future_1.21.0
[13] miniUI_0.1.1.1 withr_2.4.1
[15] colorspace_2.0-0 knitr_1.31
[17] rstudioapi_0.13 ROCR_1.0-11
[19] tensor_1.5 listenv_0.8.0
[21] MatrixGenerics_1.2.1 slam_0.1-48
[23] GenomeInfoDbData_1.2.4 polyclip_1.10-0
[25] bit64_4.0.5 farver_2.0.3
[27] parallelly_1.23.0 vctrs_0.3.6
[29] generics_0.1.0 xfun_0.21
[31] biovizBase_1.38.0 BiocFileCache_1.14.0
[33] lsa_0.73.2 ggseqlogo_0.1
[35] R6_2.5.0 bitops_1.0-6
[37] spatstat.utils_2.1-0 cachem_1.0.3
[39] DelayedArray_0.16.1 assertthat_0.2.1
[41] promises_1.2.0.1 scales_1.1.1
[43] nnet_7.3-14 gtable_0.3.0
[45] globals_0.14.0 goftest_1.2-2
[47] rlang_0.4.10 RcppRoll_0.3.0
[49] splines_4.0.2 lazyeval_0.2.2
[51] dichromat_2.0-0 checkmate_2.0.0
[53] spatstat.geom_2.1-0 reshape2_1.4.4
[55] abind_1.4-5 backports_1.2.1
[57] httpuv_1.5.5 Hmisc_4.4-2
[59] tools_4.0.2 ellipsis_0.3.1
[61] spatstat.core_2.1-2 RColorBrewer_1.1-2
[63] ggridges_0.5.3 Rcpp_1.0.7
[65] plyr_1.8.6 base64enc_0.1-3
[67] progress_1.2.2 zlibbioc_1.36.0
[69] purrr_0.3.4 RCurl_1.98-1.2
[71] prettyunits_1.1.1 rpart_4.1-15
[73] openssl_1.4.3 deldir_0.2-9
[75] pbapply_1.4-3 zoo_1.8-8
[77] SummarizedExperiment_1.20.0 ggrepel_0.9.1
[79] cluster_2.1.0 magrittr_2.0.1
[81] scattermore_0.7 lmtest_0.9-38
[83] RANN_2.6.1 SnowballC_0.7.0
[85] ProtGenerics_1.22.0 fitdistrplus_1.1-3
[87] matrixStats_0.58.0 hms_1.0.0
[89] patchwork_1.1.1 mime_0.9
[91] xtable_1.8-4 XML_3.99-0.5
[93] jpeg_0.1-8.1 sparsesvd_0.2
[95] gridExtra_2.3 compiler_4.0.2
[97] biomaRt_2.46.3 tibble_3.0.6
[99] KernSmooth_2.23-17 crayon_1.4.1
[101] htmltools_0.5.1.1 mgcv_1.8-31
[103] later_1.1.0.1 Formula_1.2-4
[105] tidyr_1.1.2 DBI_1.1.1
[107] tweenr_1.0.1 dbplyr_2.1.0
[109] MASS_7.3-51.6 rappdirs_0.3.3
[111] Matrix_1.3-4 igraph_1.2.6
[113] pkgconfig_2.0.3 GenomicAlignments_1.26.0
[115] foreign_0.8-80 plotly_4.9.3
[117] spatstat.sparse_2.0-0 xml2_1.3.2
[119] VariantAnnotation_1.36.0 stringr_1.4.0
[121] digest_0.6.27 sctransform_0.3.2
[123] RcppAnnoy_0.0.18 spatstat.data_2.1-0
[125] leiden_0.3.7 fastmatch_1.1-0
[127] htmlTable_2.1.0 uwot_0.1.10
[129] curl_4.3 shiny_1.6.0
[131] Rsamtools_2.6.0 lifecycle_0.2.0
[133] nlme_3.1-148 jsonlite_1.7.2
[135] viridisLite_0.3.0 askpass_1.1
[137] pillar_1.4.7 lattice_0.20-41
[139] KEGGREST_1.30.1 fastmap_1.1.0
[141] httr_1.4.2 survival_3.1-12
[143] glue_1.4.2 qlcMatrix_0.9.7
[145] png_0.1-7 bit_4.0.4
[147] ggforce_0.3.2 stringi_1.5.3
[149] blob_1.2.1 latticeExtra_0.6-29
[151] memoise_2.0.0 dplyr_1.0.4
[153] irlba_2.3.3 future.apply_1.7.0

Best, Hope hearing from you sooner.

timoast commented 3 years ago

I think the issue here is that there are no fragments detected at any TSS for any cells. You must have either a mismatch between cell names in the object and cell names in the fragment file (no cells being found), or chromosome names in the gene annotation and chromosome names in the fragment file (no genes being found).

To check the cell naming, can you run:

head(Fragments(obj)[[1]]@cells)

and also:

totals <- CountFragments(Fragments(obj)[[1]])
sum(Fragments(obj)[[1]]@cells %in% totals$CB)
honghh2018 commented 3 years ago

OK, The fragment file's chromosome attach with chr prefix, but the annotation package's chromosome with chr prefix, like below:

Fragment file: image

annotation variable: image

Besides, there are weird problem as i run the code you posted.

It show me error like below:

image

totals <- CountFragments(Fragments(obj)[[1]]) There is no way to force this S4 class to be a vector

traceback() 5: as.character.default(x) 4: as.character(x) 3: grepl(pattern = "^http|^ftp", x = x) 2: isRemote(x = fragments) 1: CountFragments(Fragments(obj)[[1]])

what happen in this issue? and how can i fix it? Best,

timoast commented 3 years ago

The issue is that the chromosome names in the gene annotation do not match the chromosome names in the fragment file. It seems you did not actually run the line seqlevelsStyle(annotation) <- "UCSC"?

You need to ensure the gene annotation matches the chromosome naming in the fragment file

honghh2018 commented 3 years ago

Hi @timoast , I eventually know what wrong in my data, it was the fragment chromosome seqname no match with R package EnsDb.Mmusculus.v79. i want to change the chromosome name in annotation from annotation <- GetGRangesFromEnsDb(ensdb = EnsDb.Mmusculus.v79) how can i do this fix? Any advices would be appreciated Best

timoast commented 3 years ago

seqlevelsStyle(annotation) <- "UCSC"