stuart-lab / signac

R toolkit for the analysis of single-cell chromatin data
https://stuartlab.org/signac/
Other
328 stars 88 forks source link

Different results from .h5 file and fragments file #1128

Closed RongDing-NMU closed 2 years ago

RongDing-NMU commented 2 years ago

Hello, I am learning scATAC-seq analysis recently. Thanks to signac package for the great help and good vignettes. But there is a confusing probelm when build ATAC Object. The matrix built from the filtered_peak_bc_matrix.h5 file seems to be unequal that built from fragments.tsv.gz file. 5k-cell PBMC dataset is provided by 10x Genomics.

Here is the code i am using:

fpath <- "~/Data/pbmc_5k/fragments.tsv.gz" counts.5k <- Read10X_h5("/data/dingrong/Project/pbmc_xk_scATAC/Data/pbmc_5k/filtered_peak_bc_matrix.h5") pbmc5k_assay.counts <- CreateChromatinAssay(counts = counts.5k, sep = c(":", "-"), min.features = 500, fragments = fpath) pbmc5k.counts <- CreateSeuratObject(pbmc5k_assay.counts, assay = "peaks") pbmc5k.counts@assays$peaks@counts[1:5,colnames(pbmc5k.counts)[1:2]] AAACGAAAGACGTCAG-1 AAACGAAAGATTGACA-1 chr1-569218-569595 . . chr1-713583-714675 2 4 chr1-752492-752958 . . chr1-762416-763300 . . chr1-804992-805708 . 2

fpath <- "~/Data/pbmc_5k/fragments.tsv.gz" fragcounts <- CountFragments(fragments = fpath) atac.cells <- fragcounts[fragcounts$frequency_count > 500, "CB"] atac.frags <- CreateFragmentObject(path = fpath, cells = atac.cells) frags.5k <- FeatureMatrix(fragments =atac.frags, features = granges(pbmc5k.counts), cells = atac.cells) pbmc5k_assay.frags <- CreateChromatinAssay(counts = frags.5k, sep = c(":", "-"), min.features = 500) pbmc5k.frags <- CreateSeuratObject(counts = pbmc5k_assay.frags, assay = "peaks") pbmc5k.frags@assays$peaks@counts[1:5,colnames(pbmc5k.counts)[1:2]] AAACGAAAGACGTCAG-1 AAACGAAAGATTGACA-1 chr1-569218-569595 . . chr1-713583-714675 1 2 chr1-752492-752958 . . chr1-762416-763300 . . chr1-804992-805708 . 1

colSums(pbmc5k.counts@assays$peaks@counts)[1:10]/colSums(pbmc5k.frags@assays$peaks@counts)[1:10] AAACGAAAGACGTCAG-1 AAACGAAAGATTGACA-1 AAACGAAAGGGTCCCT-1 AAACGAACAATTGTGC-1 2.0899864 1.4348795 2.9912162 1.2647245 AAACGAACACTCGTGG-1 AAACGAACAGGTAGGT-1 AAACGAACAGGTCCTG-1 AAACGAACATCATAGC-1 1.2063567 0.5656250 0.7837655 2.4370162 AAACGAACATTTCACT-1 AAACGAAGTAACATAG-1 2.7232737 1.0501415

raw_peak <- FeatureMatrix(fragments = Fragments(pbmc5k.counts), features = granges(pbmc5k.counts), cells = > colnames(pbmc5k.counts)) raw_peak.frags <- CreateChromatinAssay(counts = raw_peak, sep = c(":", "-"), min.features = 500) pbmc5k.feature <- CreateSeuratObject(counts = raw_peak.frags, assay = "peaks") pbmc5k.feature@assays$peaks@counts[1:5,colnames(pbmc5k.counts)[1:2]] AAACGAAAGACGTCAG-1 AAACGAAAGATTGACA-1 chr1-569218-569595 . . chr1-713583-714675 1 2 chr1-752492-752958 . . chr1-762416-763300 . . chr1-804992-805708 . 1

colSums(pbmc5k.counts@assays$peaks@counts)[1:10]/colSums(pbmc5k.feature@assays$peaks@counts)[1:10] AAACGAAAGACGTCAG-1 AAACGAAAGATTGACA-1 AAACGAAAGGGTCCCT-1 AAACGAACAATTGTGC-1 1.944468 1.952058 1.893823 1.941899 AAACGAACACTCGTGG-1 AAACGAACAGGTAGGT-1 AAACGAACAGGTCCTG-1 AAACGAACATCATAGC-1 1.945695 1.941019 1.946612 1.932478 AAACGAACATTTCACT-1 AAACGAAGTAACATAG-1 1.890963 1.886840

Thanks

timoast commented 2 years ago

The difference is that cellranger counts the cut sites, Signac counts the fragments, so the counts will be roughly half. This paper suggests that using fragment counts rather than cut sites may be more appropriate, although the practical difference is likely small when using LSI for dimension reduction.