stuart-lab / signac

R toolkit for the analysis of single-cell chromatin data
https://stuartlab.org/signac/
Other
330 stars 88 forks source link

Question about nucleosome signal NA values #1688

Closed danielcgingerich closed 7 months ago

danielcgingerich commented 7 months ago

I noticed this when plotting a violin of the nucleosome signal of my data.

seu <- NucleosomeSignal(object = seu)
    Found 4974 cell barcodes
    Done Processing 24 million lines
atac_nucleosome <- VlnPlot(seu, features = 'nucleosome_signal') + geom_hline(yintercept = 4)
    Warning message:
    Removing 1 cells missing data for vars requested

So I figured this was due to NA values in the metadata:

which(is.na(seu$nucleosome_signal))
    19997_AGCTTCCTCATCGTTT-1
     706

To investigate further I ran the command NucleosomeSignal from the source code

object <- seu
verbose <- TRUE
assay <- 'atac'
library(fastmatch) 
ExtractFragments <- Signac:::ExtractFragments
    if (!inherits(x = object[[assay]], what = "ChromatinAssay")) {
        stop("The requested assay is not a ChromatinAssay")
    }
    frags <- Fragments(object = object[[assay]])
    if (length(x = frags) == 0) {
        stop("No fragment files present in assay")
    }
    verbose <- as.logical(x = verbose)
    af <- list()
    for (i in seq_along(along.with = frags)) {
        counts <- ExtractFragments(fragments = frags[[i]], n = n,
            verbose = verbose)
        cells.keep <- fmatch(x = counts$CB, table = colnames(x = object),
            nomatch = 0L)
        rownames(x = counts) <- counts$CB
        counts <- counts[cells.keep > 0, c("mononucleosomal",
            "nucleosome_free")]
        af[[i]] <- counts
    }
    af <- do.call(what = rbind, args = af)

Identified the culprit as af

af[706, ]
              mononucleosomal nucleosome_free
    19997_AGCTTCCTCATCGTTT-1               0                               0

But this did not make sense, because the cell does have fragments...

seu@meta.data[706, c('atac_fragments', 'atac_peak_region_fragments')]
                         atac_fragments atac_peak_region_fragments
19997_AGCTTCCTCATCGTTT-1             27                          2

And this is confirmed by looking at the fragments file:

zgrep AGCTTCCTCATCGTTT atac_fragments.tsv.gz
    chr11   15157880        15158070 AGCTTCCTCATCGTTT-1     1
    chr11   43050631        43050710 AGCTTCCTCATCGTTT-1     1
    chr12   94798526        94798735 AGCTTCCTCATCGTTT-1     1
    chr14   43095389        43095570 AGCTTCCTCATCGTTT-1     1
    chr15   25574659        25574786 AGCTTCCTCATCGTTT-1     1
    chr15   25609309        25609633 AGCTTCCTCATCGTTT-1     1
    chr17   10059751        10059936 AGCTTCCTCATCGTTT-1     2
    chr18   72423744        72423922 AGCTTCCTCATCGTTT-1     1
    chr2    4741107 4741190 AGCTTCCTCATCGTTT-1   1
    chr2    170815035       170815252     AGCTTCCTCATCGTTT-1        2
    chr21   13907595        13907770 AGCTTCCTCATCGTTT-1     4
    chr21   27401138        27401253 AGCTTCCTCATCGTTT-1     1
    chr3    96554894        96554938 AGCTTCCTCATCGTTT-1     2
    chr3    188831834       188831962     AGCTTCCTCATCGTTT-1        4
    chr4    89262381        89262587 AGCTTCCTCATCGTTT-1     1
    chr4    105171062       105171113     AGCTTCCTCATCGTTT-1        1
    chr4    180251412       180251728     AGCTTCCTCATCGTTT-1        1
    chr4    180477650       180477766     AGCTTCCTCATCGTTT-1        1
    chr7    14756914        14757268 AGCTTCCTCATCGTTT-1     2
    chr7    17474725        17474930 AGCTTCCTCATCGTTT-1     3
    chr7    18580887        18581091 AGCTTCCTCATCGTTT-1     2
    chr7    128135408       128135805     AGCTTCCTCATCGTTT-1        1
    chr8    54102001        54102131 AGCTTCCTCATCGTTT-1     2
    chr8    109466587       109466726     AGCTTCCTCATCGTTT-1        2
    chr9    13811876        13812058 AGCTTCCTCATCGTTT-1     2
    chrX    124537322       124537373     AGCTTCCTCATCGTTT-1        1
    chrX    141375534       141375655     AGCTTCCTCATCGTTT-1        1
danielcgingerich commented 7 months ago

Wait... is n =argument from NucleosomeSignalresponsible for this? Is it reading a random sample of n fragments of the fragment file?

timoast commented 7 months ago

Yes, a subset of the fragment file is used (otherwise it takes too long). If you have cells with very low number of fragments then it's possible that none of them are seen in the sample

danielcgingerich commented 7 months ago

Thanks! I've updated my code with n = NULL