TsviyaOlender commented 6 months ago

I am running your tutorial on ATAC-seq 10x PBMC data., but I am getting errors. I got an error when I try to subset the data. I solve it using the command pbmc@assays[["peaks"]]<-as(pbmc@assays[["peaks"]], "Assay5") which I found in one of the discussion group. Now the normalization doesn't work for me:

pbmc <- RunTFIDF(pbmc) Error in as(object = object, Class = "CsparseMatrix") : no method or default for coercing "Assay5" to "CsparseMatrix"

I am using signac version 1.11.9 and Seurat 5.0.0 My code is: library(Signac)# 1.11.9, currently avaliable 1.13.0 library(Seurat) if (!requireNamespace("EnsDb.Hsapiens.v75", quietly = TRUE)) BiocManager::install("EnsDb.Hsapiens.v75") library(EnsDb.Hsapiens.v75)

create seurat object

####################################################################### counts <- Read10X_h5(filename = "data/atac_v1_pbmc_10k_filtered_peak_bc_matrix.h5")

metadata <- read.csv(

file = "../vignette_data/atac_v1_pbmc_10k_singlecell.csv",

header = TRUE,

row.names = 1

)

chrom_assay <- CreateChromatinAssay( counts = counts, sep = c(":", "-"), fragments = 'data/atac_v1_pbmc_10k_fragments.tsv.gz', min.cells = 10, min.features = 200 )

pbmc <- CreateSeuratObject( counts = chrom_assay,

meta.data = metadata,

assay = "peaks" ) pbmc@assays[["peaks"]]<-as(pbmc@assays[["peaks"]], "Assay5")

extract gene annotations from EnsDb

annotations <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v75)

change to UCSC style since the data was mapped to hg19

seqlevels(annotations) <- paste0('chr', seqlevels(annotations)) genome(annotations) <- "hg19"

add the gene information to the object

Annotation(pbmc) <- annotations

add the gene information to the object

Annotation(pbmc) <- annotations

Computing QC Metrics

As with scRNA-seq, the expected range of values for these parameters will vary depending on your biological system,

cell viability, and other factors.

Nucleosome banding pattern: The histogram of DNA fragment sizes (determined from the paired-end sequencing reads)

should exhibit a strong nucleosome banding pattern corresponding to the length of DNA wrapped around a single nucleosome.

We calculate this per single cell, and quantify the approximate ratio of mononucleosomal to nucleosome-free fragments

(stored as nucleosome_signal)

Transcriptional start site (TSS) enrichment score.

The ENCODE project has defined an ATAC-seq targeting score based on the ratio of fragments centered at the TSS to

fragments in TSS-flanking regions (see https://www.encodeproject.org/data-standards/terms/).

Poor ATAC-seq experiments typically will have a low TSS enrichment score.

We can compute this metric for each cell with the TSSEnrichment() function, and the results are stored in

metadata under the column name TSS.enrichment.

Total number of fragments in peaks: A measure of cellular sequencing depth / complexity.

Cells with very few reads may need to be excluded due to low sequencing depth. Cells with extremely high levels

may represent doublets, nuclei clumps, or other artefacts.

Fraction of fragments in peaks: Represents the fraction of all fragments that fall within ATAC-seq peaks.

Cells with low values (i.e. <15-20%) often represent low-quality cells or technical artifacts that should be removed.

Note that this value can be sensitive to the set of peaks used.

Ratio reads in genomic blacklist regions The ENCODE project has provided a list of blacklist regions,

representing reads which are often associated with artefactual signal. Cells with a high proportion of reads mapping to

these areas (compared to reads mapping to peaks) often represent technical artifacts and should be removed. ENCODE blacklist regions for human (hg19 and GRCh38), mouse (mm10), Drosophila (dm3), and C. elegans (ce10) are included in the Signac package.

compute nucleosome signal score per cell

pbmc <- NucleosomeSignal(object = pbmc)

compute TSS enrichment score per cell

pbmc <- TSSEnrichment(object = pbmc, fast = FALSE)

add blacklist ratio and fraction of reads in peaks

pbmc$pct_reads_in_peaks <- pbmc$peak_region_fragments / pbmc$passed_filters * 100 pbmc$blacklist_ratio <- pbmc$blacklist_region_fragments / pbmc$peak_region_fragments

add blacklist ratio and fraction of reads in peaks

I think that meatdata is not avaliable for multiome

pbmc$pct_reads_in_peaks <- pbmc$peak_region_fragments / pbmc$passed_filters * 100

pbmc$blacklist_ratio <- pbmc$blacklist_region_fragments / pbmc$peak_region_fragments

DensityScatter(pbmc, x = 'nCount_peaks', y = 'TSS.enrichment', log_x = TRUE, quantiles = TRUE) pbmc$high.tss <- ifelse(pbmc$TSS.enrichment > 3, 'High', 'Low') TSSPlot(pbmc, group.by = 'high.tss') + NoLegend() # pbmc$nucleosome_group <- ifelse(pbmc$nucleosome_signal > 4, 'NS > 4', 'NS < 4') FragmentHistogram(object = pbmc, group.by = 'nucleosome_group') #

We can plot the distribution of each QC metric separately using a violin plot:

VlnPlot(object = pbmc, features = c('nCount_peaks', 'TSS.enrichment', 'nucleosome_signal'), pt.size = 0.1, ncol = 3 ) # pbmc <- subset( x = pbmc, subset = nCount_peaks > 3000 & nCount_peaks < 30000 & nucleosome_signal < 4 & TSS.enrichment > 3 ) pbmc

Normalization and linear dimensional reduction

pbmc <- RunTFIDF(pbmc) pbmc <- FindTopFeatures(pbmc, min.cutoff = 'q0') pbmc <- RunSVD(pbmc)

session_info:

sessionInfo() R version 4.3.1 (2023-06-16) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

Matrix products: default BLAS/LAPACK: FlexiBLAS OPENBLAS; LAPACK version 3.10.1

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: Asia/Jerusalem tzcode source: system (glibc)

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] EnsDb.Hsapiens.v75_2.99.0 fdrtool_1.2.17
[3] DESeq2_1.40.2 SummarizedExperiment_1.30.2 [5] MatrixGenerics_1.12.3 matrixStats_1.0.0
[7] scCustomize_2.0.1 SeuratDisk_0.0.0.9020
[9] cowplot_1.1.1 ggplot2_3.4.2
[11] EnsDb.Hsapiens.v86_2.99.0 ensembldb_2.24.0
[13] AnnotationFilter_1.24.0 GenomicFeatures_1.52.1
[15] AnnotationDbi_1.62.2 Biobase_2.60.0
[17] GenomicRanges_1.52.0 GenomeInfoDb_1.36.1
[19] IRanges_2.34.1 S4Vectors_0.38.2
[21] BiocGenerics_0.46.0 Signac_1.11.9000
[23] Seurat_5.0.0 SeuratObject_5.0.0
[25] sp_2.0-0

loaded via a namespace (and not attached): [1] ProtGenerics_1.32.0 spatstat.sparse_3.0-2 bitops_1.0-7
[4] lubridate_1.9.2 httr_1.4.6 RColorBrewer_1.1-3
[7] backports_1.4.1 tools_4.3.1 sctransform_0.4.1
[10] utf8_1.2.3 R6_2.5.1 lazyeval_0.2.2
[13] uwot_0.1.16 withr_2.5.0 prettyunits_1.1.1
[16] gridExtra_2.3 progressr_0.14.0 cli_3.6.1
[19] spatstat.explore_3.2-1 fastDummies_1.7.3 labeling_0.4.2
[22] spatstat.data_3.0-1 ggridges_0.5.4 pbapply_1.7-2
[25] Rsamtools_2.16.0 foreign_0.8-84 dichromat_2.0-0.1
[28] parallelly_1.36.0 BSgenome_1.68.0 rstudioapi_0.15.0
[31] RSQLite_2.3.1 generics_0.1.3 shape_1.4.6
[34] BiocIO_1.10.0 ica_1.0-3 spatstat.random_3.1-5
[37] dplyr_1.1.2 Matrix_1.6-1 ggbeeswarm_0.7.2
[40] fansi_1.0.4 abind_1.4-5 lifecycle_1.0.3
[43] yaml_2.3.7 snakecase_0.11.1 BiocFileCache_2.8.0
[46] Rtsne_0.16 paletteer_1.5.0 grid_4.3.1
[49] blob_1.2.4 promises_1.2.1 crayon_1.5.2
[52] miniUI_0.1.1.1 lattice_0.21-8 KEGGREST_1.40.0
[55] knitr_1.43 pillar_1.9.0 rjson_0.2.21
[58] future.apply_1.11.0 codetools_0.2-19 fastmatch_1.1-3
[61] leiden_0.4.3 glue_1.6.2 data.table_1.14.8
[64] vctrs_0.6.3 png_0.1-8 spam_2.9-1
[67] gtable_0.3.3 rematch2_2.1.2 cachem_1.0.8
[70] xfun_0.40 S4Arrays_1.2.1 mime_0.12
[73] survival_3.5-7 RcppRoll_0.3.0 ellipsis_0.3.2
[76] fitdistrplus_1.1-11 ROCR_1.0-11 nlme_3.1-163
[79] bit64_4.0.5 progress_1.2.2 filelock_1.0.2
[82] RcppAnnoy_0.0.21 irlba_2.3.5.1 rpart_4.1.19
[85] vipor_0.4.5 KernSmooth_2.23-22 colorspace_2.1-0
[88] DBI_1.1.3 Hmisc_5.1-0 nnet_7.3-19
[91] ggrastr_1.0.2 tidyselect_1.2.0 bit_4.0.5
[94] compiler_4.3.1 curl_5.0.1 htmlTable_2.4.1
[97] hdf5r_1.3.8 xml2_1.3.5 DelayedArray_0.26.7
[100] plotly_4.10.2 rtracklayer_1.60.0 checkmate_2.2.0
[103] scales_1.2.1 lmtest_0.9-40 rappdirs_0.3.3
[106] stringr_1.5.0 digest_0.6.33 goftest_1.2-3
[109] spatstat.utils_3.0-3 rmarkdown_2.24 XVector_0.40.0
[112] htmltools_0.5.6 pkgconfig_2.0.3 base64enc_0.1-3
[115] dbplyr_2.3.3 fastmap_1.1.1 rlang_1.1.1
[118] GlobalOptions_0.1.2 htmlwidgets_1.6.2 shiny_1.7.5
[121] farver_2.1.1 zoo_1.8-12 jsonlite_1.8.7
[124] BiocParallel_1.34.2 VariantAnnotation_1.46.0 RCurl_1.98-1.12
[127] magrittr_2.0.3 Formula_1.2-5 GenomeInfoDbData_1.2.10 [130] dotCall64_1.0-2 patchwork_1.1.2 munsell_0.5.0
[133] Rcpp_1.0.11 reticulate_1.31 stringi_1.7.12
[136] zlibbioc_1.46.0 MASS_7.3-60 plyr_1.8.8
[139] parallel_4.3.1 listenv_0.9.0 ggrepel_0.9.3
[142] forcats_1.0.0 deldir_1.0-9 Biostrings_2.68.1
[145] splines_4.3.1 tensor_1.5 hms_1.1.3
[148] circlize_0.4.15 locfit_1.5-9.8 igraph_1.5.1
[151] spatstat.geom_3.2-4 RcppHNSW_0.4.1 reshape2_1.4.4
[154] biomaRt_2.56.1 XML_3.99-0.14 evaluate_0.21
[157] biovizBase_1.48.0 BiocManager_1.30.22 ggprism_1.0.4
[160] httpuv_1.6.11 RANN_2.6.1 tidyr_1.3.0
[163] purrr_1.0.2 polyclip_1.10-4 future_1.33.0
[166] scattermore_1.2 janitor_2.2.0 xtable_1.8-4
[169] restfulr_0.0.15 RSpectra_0.16-1 later_1.3.1
[172] viridisLite_0.4.2 tibble_3.2.1 memoise_2.0.1
[175] beeswarm_0.4.0 GenomicAlignments_1.36.0 cluster_2.1.4
[178] timechange_0.2.0 globals_0.16.2

# insert reproducible example here

timoast commented 6 months ago

You should not manually change the class of assays as this can result in errors. Do not run this line pbmc@assays[["peaks"]]<-as(pbmc@assays[["peaks"]], "Assay5")

Please update to the latest Signac release and try again

TsviyaOlender commented 6 months ago

Thanks you very much for your answer. I updated the Signac and it is working now.

From: Tim Stuart @.> Sent: Monday, May 6, 2024 4:14 AM To: stuart-lab/signac @.> Cc: Tsviya Olender @.>; Author @.> Subject: Re: [stuart-lab/signac] I am getting an error when I use RunTFIDF (Issue #1693)

Caution: External Sender. Do not click on links or open attachments unless you recognize the sender.

You should not manually change the class of the matrix stored in assays as this can result in errors. Do not run this line @.**@.[["peaks"]], "Assay5")

Please update to the latest Signac release and try again

— Reply to this email directly, view it on GitHubhttps://github.com/stuart-lab/signac/issues/1693#issuecomment-2095043202, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADIHHARQDOVDA27IDHQEIV3ZA3KONAVCNFSM6AAAAABHDGJ4JOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJVGA2DGMRQGI. You are receiving this because you authored the thread.Message ID: @.**@.>>

stuart-lab / signac

I am getting an error when I use RunTFIDF #1693

create seurat object

metadata <- read.csv(

file = "../vignette_data/atac_v1_pbmc_10k_singlecell.csv",

header = TRUE,

row.names = 1

)

meta.data = metadata,

extract gene annotations from EnsDb

change to UCSC style since the data was mapped to hg19

add the gene information to the object

add the gene information to the object

Computing QC Metrics

As with scRNA-seq, the expected range of values for these parameters will vary depending on your biological system,

cell viability, and other factors.

Nucleosome banding pattern: The histogram of DNA fragment sizes (determined from the paired-end sequencing reads)

should exhibit a strong nucleosome banding pattern corresponding to the length of DNA wrapped around a single nucleosome.

We calculate this per single cell, and quantify the approximate ratio of mononucleosomal to nucleosome-free fragments

(stored as nucleosome_signal)

Transcriptional start site (TSS) enrichment score.

The ENCODE project has defined an ATAC-seq targeting score based on the ratio of fragments centered at the TSS to

fragments in TSS-flanking regions (see https://www.encodeproject.org/data-standards/terms/).

Poor ATAC-seq experiments typically will have a low TSS enrichment score.

We can compute this metric for each cell with the TSSEnrichment() function, and the results are stored in

metadata under the column name TSS.enrichment.

Total number of fragments in peaks: A measure of cellular sequencing depth / complexity.

Cells with very few reads may need to be excluded due to low sequencing depth. Cells with extremely high levels

may represent doublets, nuclei clumps, or other artefacts.

Fraction of fragments in peaks: Represents the fraction of all fragments that fall within ATAC-seq peaks.

Cells with low values (i.e. <15-20%) often represent low-quality cells or technical artifacts that should be removed.

Note that this value can be sensitive to the set of peaks used.

Ratio reads in genomic blacklist regions The ENCODE project has provided a list of blacklist regions,

representing reads which are often associated with artefactual signal. Cells with a high proportion of reads mapping to

these areas (compared to reads mapping to peaks) often represent technical artifacts and should be removed. ENCODE blacklist regions for human (hg19 and GRCh38), mouse (mm10), Drosophila (dm3), and C. elegans (ce10) are included in the Signac package.

compute nucleosome signal score per cell

compute TSS enrichment score per cell

add blacklist ratio and fraction of reads in peaks

add blacklist ratio and fraction of reads in peaks

I think that meatdata is not avaliable for multiome

pbmc$pct_reads_in_peaks <- pbmc$peak_region_fragments / pbmc$passed_filters * 100

pbmc$blacklist_ratio <- pbmc$blacklist_region_fragments / pbmc$peak_region_fragments

We can plot the distribution of each QC metric separately using a violin plot:

Normalization and linear dimensional reduction