jianhong / ATACseqQC

ATAC-seq Quality Control
https://jianhong.github.io/ATACseqQC/articles/ATACseqQC.html
23 stars 12 forks source link

NA values in sigs makes heatmap not completed #49

Closed hyjforesight closed 2 years ago

hyjforesight commented 2 years ago

Hello ATACseqQC, I found that the heatmap of my mouse ATAC-seq was not completed. Only half was shown whatever I tried.

txs <- txs[seqnames(txs) %in% c("chr1", "chr2", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chrX", "chrY")]
genome <- Mmusculus
objs <- splitGAlignmentsByCut(obj=gal1, txs=txs, genome=genome, outPath="D:/HYJ/splited_Cracd_KO1/")
bamfiles <- file.path("D:/HYJ/splited_Cracd_KO1/", c("NucleosomeFree.bam", "mononucleosome.bam", "dinucleosome.bam", "trinucleosome.bam"))
TSS <- promoters(txs, upstream=0, downstream=1)
TSS <- unique(TSS)
librarySize <- estLibSize(bamfiles)
librarySize
D:/HYJ/splited_Cracd_KO1//NucleosomeFree.bam D:/HYJ/splited_Cracd_KO1//mononucleosome.bam 
                                     7552685                                      1484090 
  D:/HYJ/splited_Cracd_KO1//dinucleosome.bam  D:/HYJ/splited_Cracd_KO1//trinucleosome.bam 
                                     1497497                                            0 

NTILE <- 101
dws <- ups <- 1010
sigs <- enrichedFragments(gal=objs[c("NucleosomeFree", "mononucleosome", "dinucleosome", "trinucleosome")], TSS=TSS, librarySize=librarySize, upstream = ups, downstream = dws, TSS.filter=0.5, seqlev = paste0("chr", c(1:19, "X", "Y")), n.tile = NTILE)
sigs.log2 <- lapply(sigs, function(.ele) log2(.ele+1))
featureAlignedHeatmap(cvglists=sigs.log2, feature.gr=reCenterPeaks(peaks=TSS, width=ups+dws), upstream = ups, downstream = dws, zeroAt=0.5, n.tile=NTILE)

image

I think the reason may be the NA values in sigs. And the NA values may be caused by the not well-aligned bam files because there is no conservation=mm10_gscore in objs <- splitGAlignmentsByCut(obj=gal1, txs=txs, genome=genome, outPath="D:/HYJ/splited_Cracd_KO1/"). However, if I add conservation=mm10_gscore, the splitGAlignmentsByCut() will generate Error: subscript contains invalid names.

sigs
$NucleosomeFree
               [,1]       [,2]     [,3]     [,4]      [,5]     [,6]     [,7]     [,8]     [,9]     [,10]    [,11]
    [1,]         NA         NA       NA       NA        NA       NA       NA       NA       NA        NA       NA
    [2,]  1.5850972  1.5850972 0.000000 0.000000  0.000000 0.000000 0.000000 0.000000 0.000000  0.000000 0.000000
    [3,]  0.0000000  0.0000000 0.000000 0.000000  0.000000 0.000000 0.000000 2.451784 2.451784  2.451784 2.451784
    [4,]  0.0000000  0.0000000 0.000000 0.000000  0.000000 1.585097 1.585097 1.585097 1.585097  1.585097 0.000000
    [5,]  0.0000000  1.2258919 1.225892 1.225892  1.225892 1.225892 0.000000 0.000000 0.000000  0.000000 0.000000
    [6,]         NA         NA       NA       NA        NA       NA       NA       NA       NA        NA       NA
    [7,]         NA         NA       NA       NA        NA       NA       NA       NA       NA        NA       NA
    [8,]  0.0000000  0.0000000 0.000000 0.000000  0.000000 0.000000 0.000000 0.000000 0.000000  0.000000 0.000000
    [9,]         NA         NA       NA       NA        NA       NA       NA       NA       NA        NA       NA

Could you please help me with this issue? Thanks! Best, YJ

jianhong commented 2 years ago

Thank you for reporting. I will try to figure out the NA values.

jianhong commented 2 years ago

Could you first try to set the librarySize of trinucleosome.bam to 1 and let me know the output.

hyjforesight commented 2 years ago

hello @jianhong Could you please tell me how to set the librarySize of trinucleosome.bam as 1? To avoid the trinucleosome.bam, I set bamfiles to include only "NucleosomeFree.bam", "mononucleosome.bam", "dinucleosome.bam", but I still got errors length(gal) == 4 is not TRUE and NA values. Please see below.

bamfiles <- file.path("D:/HYJ/splited_Cracd_KO1/", c("NucleosomeFree.bam", "mononucleosome.bam", "dinucleosome.bam"))
librarySize <- estLibSize(bamfiles)
librarySize
D:/HYJ/splited_Cracd_KO1//NucleosomeFree.bam D:/HYJ/splited_Cracd_KO1//mononucleosome.bam 
                                     7552685                                      1484090 
  D:/HYJ/splited_Cracd_KO1//dinucleosome.bam 
                                     1497497
NTILE <- 101
dws <- ups <- 1010
sigs <- enrichedFragments(gal=objs[c("NucleosomeFree", "mononucleosome", "dinucleosome")], TSS=TSS, librarySize=librarySize, upstream = ups, downstream = dws, TSS.filter=0.5, seqlev = paste0("chr", c(1:19, "X", "Y")), n.tile = NTILE)

Error in enrichedFragments(gal = objs[c("NucleosomeFree", "mononucleosome",  : 
  length(gal) == 4 is not TRUE

sigs
$NucleosomeFree
               [,1]       [,2]     [,3]     [,4]      [,5]     [,6]     [,7]     [,8]     [,9]     [,10]    [,11]
    [1,]         NA         NA       NA       NA        NA       NA       NA       NA       NA        NA       NA
    [2,]  1.5850972  1.5850972 0.000000 0.000000  0.000000 0.000000 0.000000 0.000000 0.000000  0.000000 0.000000
    [3,]  0.0000000  0.0000000 0.000000 0.000000  0.000000 0.000000 0.000000 2.451784 2.451784  2.451784 2.451784
    [4,]  0.0000000  0.0000000 0.000000 0.000000  0.000000 1.585097 1.585097 1.585097 1.585097  1.585097 0.000000
    [5,]  0.0000000  1.2258919 1.225892 1.225892  1.225892 1.225892 0.000000 0.000000 0.000000  0.000000 0.000000
    [6,]         NA         NA       NA       NA        NA       NA       NA       NA       NA        NA       NA
    [7,]         NA         NA       NA       NA        NA       NA       NA       NA       NA        NA       NA
    [8,]  0.0000000  0.0000000 0.000000 0.000000  0.000000 0.000000 0.000000 0.000000 0.000000  0.000000 0.000000
sigs.log2 <- lapply(sigs, function(.ele) log2(.ele+1))

featureAlignedHeatmap(cvglists=sigs.log2, feature.gr=reCenterPeaks(peaks=TSS, width=ups+dws), upstream = ups, downstream = dws, zeroAt=0.5, n.tile=NTILE)

The heatmap is still half shown. image

hyjforesight commented 2 years ago

hello @jianhong Thanks for pushing the patch fixing the NA problem. I can generate the complete heatmap now. However, the red signals are still partially shown on the yellow background. Any thoughts about this? if you want, I can share my bam file with you. Thanks!

NTILE <- 101
dws <- ups <- 1010
sigs <- enrichedFragments(gal=objs[c("NucleosomeFree", "mononucleosome", "dinucleosome", "trinucleosome")], TSS=TSS, librarySize=librarySize, upstream = ups, downstream = dws, TSS.filter=0.5, seqlev = paste0("chr", c(1:19, "X", "Y")), n.tile = NTILE)
sigs.log2 <- lapply(sigs, function(.ele) log2(.ele+1))

sigs.log2
$NucleosomeFree
               [,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]      [,8]      [,9]     [,10]     [,11]
    [1,]  0.0000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
    [2,]  1.3702186  1.370219  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
    [3,]  0.0000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  1.787342  1.787342  1.787342  1.787342
    [4,]  0.0000000  0.000000  0.000000  0.000000  0.000000  1.370219  1.370219  1.370219  1.370219  1.370219  0.000000
    [5,]  0.0000000  1.154384  1.154384  1.154384  1.154384  1.154384  0.000000  0.000000  0.000000  0.000000  0.000000
    [6,]  0.0000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
    [7,]  0.0000000  0.000000  0.000000  0.000000  1.214155  1.214155  1.214155  1.863977  1.863977  1.214155  1.214155
    [8,]  0.0000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
    [9,]  1.2818377  1.281838  1.281838  1.281838  0.000000  0.000000  0.000000  1.281838  1.281838  1.281838  1.281838

ATACheatmap <- featureAlignedHeatmap(cvglists=sigs.log2, feature.gr=reCenterPeaks(peaks=TSS, width=ups+dws), upstream = ups, downstream = dws, zeroAt=0.5, n.tile=NTILE)

image