Closed KittyMurphy closed 3 months ago
I've tried out various peak callers and parameters.
macs2 callpeak -f BAMPE -g hs -B --keep-dup all -t <sample>.bam
macs2 callpeak -f BEDPE -nomodel --shift -37 --extsize 73 -g hs -B --keep-dup all -t <sample>.bed
Options 1 and 2 output the same peaks (n = ~250,000), so I decided to stick with calling peaks with BAMPE as this is the direct output I get from genome alignment.
macs3 callpeak -f BAMPE -g hs -B --keep-dup all --cutoff-analysis -t <sample>.bam
--cutoff-analysis helps you decide a p or q value threshold, according to my output I should be using a p cut off of 0.01. Perhaps I was being too stringent beforehand as I filtered my peaks to those with p < 0.0000001.
macs3 callpeak -f BAMPE -g hs -B --keep-dup all -p 0.01 -t <sample>.bam
Overall, MACS3 output a higher number of peaks (n = ~300,000), and is suggested to be more suited to ATAC-seq data than MACS2.
macs3 hmmratac -b <sample>.bam
MACS3 now implements hhmratac, which is a peak caller specifically designed for ATAC-seq data. This method output ~130,000 peaks. Of note, all the MACS algorithms output peaks with similar length distribution (50-300bp), whereas hhmratac had peak lengths even as large as 23kb!
I ran differential chromatin accessibility using DeSeq2, and had much better overlap with the differentially expressed genes (also identified using DeSeq2).
This study compared different tools for differential chromatin accessibility analysis and with our number of samples, DeSeq2 had good sensitivity and specificity.
Re: motif enrichment analysis
For example, using the top 100 upregulated peaks in E2, the mean distance to TSS of these peaks is ~40,000. In comparison, the mean distance to TSS of the background peaks is ~1,800. Could explain why the enrichments aren't super strong (as is usually seen with homer). Also, when I've seen examples of homer output the number of target and background sequences is not so different. I have 100 target sequences and 170,000 background sequences, might be worth subsetting.