macs3-project / MACS

MACS -- Model-based Analysis of ChIP-Seq
https://macs3-project.github.io/MACS/
BSD 3-Clause "New" or "Revised" License
695 stars 270 forks source link

ATAC-Seq parameters #195

Open firatuyulur opened 7 years ago

firatuyulur commented 7 years ago

Hi, I have gone through the other two ATAC-seq issues. Accordingly, I tried one run with BAMPE and one without. here are the codes.

macs2 callpeak -t raw/dH1f_1.bam -n dh1f_1.broad -g hs -q 0.05 --nomodel --extsize 200 -f BAMPE -B --broad

callpeak -t raw/dh1f_2.bam -n dh1f_2.broad -g hs -q 0.05 --nomodel --shift -100 --extsize 200 -B --broad

Then I wanted to see how close their results are and the length of broadPeak files are as 1232 dh1f_1.broad_peaks.broadPeak 92323 dh1f.broad_peaks.broadPeak

There is a huge difference and I cannot make sure what parameters to use.

ZheFrench commented 6 years ago

I Think it's better to continue this thread because my question is quite very close and the issue is still open What is the consensus to use MACS2 with ATAC-SEQ paired-end ?

I read that -f BAMPE was a good pratice . But then should it be used with --nomodel option ? and also should we call broad or narrow peaks ?

For what it worth, this is what I'm using until now on my bam without duplicates:

macs2 callpeak --verbose 3 --treatment my.bam --name myname -g hs --bdg -q 0.05 --keep-dup all -f BAMPE

LucoLab commented 6 years ago

answer my own question :I think with -f BAMPE , no need --nomodel option, it doesn't use it by default when BAMPE is set.

dbrg77 commented 6 years ago

From an experimental point of view, the 5' end of the fragments is the main interest, so it is better to treat them as single end and perform peak calling using '-f BAM --nomodel --shift -100 --extsize 200'

See this thread: https://groups.google.com/d/msg/macs-announcement/4OCE59gkpKY/v9Tnh9jWriUJ

hmyh1202 commented 6 years ago

why --shift -100 --extsize 200 ?

kwcurrin commented 6 years ago

The --shift -100 --extsize 200 option centers a 200 bp window on the Tn5 binding site, which is more accurate for ATAC-seq data (the same option is also applicable for DNase-seq data). The 5' ends of reads represent the Tn5 (and DNase I) cut sites; so the 5' ends of reads represent the most accessible regions. If you have paired-end ATAC-seq reads, many of your read pairs will span at least one nucleosome. If you look at plots of distribution of fragment length, you should see many fragments that don't span nucleosomes (less 150-200 bp), but you will also see many fragments that do span nucleosomes (>200-300 bp). Therefore, you don't want to run MACS2 with the default model building parameters meant for ChIP-seq. These model building parameters assume that the binding site is in the middle of the fragment, which is accurate for ChIP-seq, but not for ATAC-seq. If you use the ChIP-seq options for ATAC-seq, many of your peaks will be centered on nucleosomes instead of centered on the accessible regions. The reason why a 200 bp window is often used (--shift 100 --extsize 200), is that it is reasonable to assume that ATAC-seq peaks are at least 200 bp long because this is about the size of a nucleosome-free region with a single nucleosome removed. Some people may use --shift -75 --extsize 150, with the assumption that the length of an accessible region with a single nucleosome removed is about 150 bp, which is also reasonable.

yue131 commented 4 years ago

@kwcurrin Thank you for your answer. I am using MACS2 to call peaks from ATAC-seq data. And my data is paired end reads from BWA. I do not know if I want to investigate the genome wide chromatin accessibility, which parameter will be better? macs2 callpeak -t in.bam -n in -g hs -q 0.05 --nomodel --shift 100 --extsize 200 -f BAM -B --broad

macs2 callpeak -t in.bam -n in -g hs -q 0.05 --nomodel -f BAMPE -B --broad