macs3-project / MACS

MACS -- Model-based Analysis of ChIP-Seq
https://macs3-project.github.io/MACS/
BSD 3-Clause "New" or "Revised" License
692 stars 270 forks source link

ATAC-seq settings #145

Open igordot opened 7 years ago

igordot commented 7 years ago

I am trying to call peaks in ATAC-seq data. According to the documentation:

To find enriched cutting sites such as some DNAse-Seq datasets. In this case, all 5' ends of sequenced reads should be extended in both direction to smooth the pileup signals. If the wanted smoothing window is 200bps, then use '--nomodel --shift -100 --extsize 200'. For certain nucleosome-seq data, we need to pileup the centers of nucleosomes using a half-nucleosome size for wavelet analysis (e.g. NPS algorithm). Since the DNA wrapped on nucleosome is about 147bps, this option can be used: '--nomodel --shift 37 --extsize 73'.

Based on a brief literature review, people use both --shift -100 --extsize 200 and --shift 37 --extsize 73 for ATAC-seq. Is one option more appropriate? Is there another setting I should be using?

taoliu commented 7 years ago

If you followed original protocol for ATAC-Seq, you should get Paired-End reads. If so, I would suggest you just use "--format BAMPE" to let MACS2 pileup the whole fragments in general. But if you want to focus on looking for where the 'cutting sites' are, then '--nomodel --shift -100 --extsize 200' should work.

ATpoint commented 7 years ago

I have a question towards the BAM and BAMPE mode:

The TLEN of a pair is of course determined by the insert size after the alignment. I understand that in BAMPE, MACS takes only the 5' of the first mate of the pair and uses TLEN for the fragment length. But how is it in BAM mode? Does it treat the two mates idependently, so that literally the coverage of each nt within the pair would be double as high as in BAMPE mode?

igordot commented 7 years ago

@ATpoint, according to the manual:

Pair-end mapping results can be saved in a single BAM file, if so, MACS will automatically keep the left mate(5' end) tag. However, when format BAMPE is specified, MACS will use the real fragments inferred from alignment results for reads pileup.

It sounds like only one of the mates is used.

ATpoint commented 7 years ago

Yes, but I was referring to what happens with paired-end BAMs when not BAMPE but BAM is specified as -f, as this currently seems to be the method of choice for ATAC-seq in the literature.

igordot commented 7 years ago

Based on that description, it sounds like only one mate (left mate) is used when the reads are paired and BAM (not BAMPE) is specified. Essentially, half the reads are ignored.

Hopefully @taoliu can confirm.

ATpoint commented 7 years ago

Yes you are right. The more I think about it, it probably autodetects the bitwise flag for paired-end reads and discards the reverse mate for further analysis.

crazyhottommy commented 7 years ago

so -f BAMPE is preferred for paired-end ATAC-seq?

taoliu commented 7 years ago

@crazyhottommy @igordot Yes. With -f BAMPE on, MACS2 read the left mate and the insertion length information from BAM file, and discard right mate. With -f BAM, MACS2 only keeps the left mate, so it's definitely not a feature you want for your paired-end data.

lucygarner commented 5 years ago

Why do so many papers use --shift -100 --extsize 200 for MACS2 rather than -f BAMPE if this is not recommended for paired-end data?

taoliu commented 5 years ago

Why do so many papers use --shift -100 --extsize 200 for MACS2 rather than -f BAMPE if this is not recommended for paired-end data?

--shift -100 --extsize 200 will amplify the 'cutting sites' enrichment from ATAC-seq data. So in the end, the 'peak' is where Tn5 transposase likes to attack. The fact is that, although many information such as the insertion length and the other mate alignment is ignored, such result is still usable. Especially when the short fragment population is extremely dominant, the final output won't be off much.

lucygarner commented 5 years ago

Ok thank you. But you would still recommend the -f BAMPE option? After peak calling with -f BAMPE, I then take the 5' end of the forward and reverse reads as the cut sites. Does this make sense?

wangjiawen2013 commented 5 years ago

@taoliu > @crazyhottommy @igordot Yes. With -f BAMPE on, MACS2 read the left mate and the insertion length information from BAM file, and discard right mate. With -f BAM, MACS2 only keeps the left mate, so it's definitely not a feature you want for your paired-end data.

How does --keep-dup 1 works when setting -f BAMPE ? will --keep-dup 1 still deduplicate the fragments ?

Maarten-vd-Sande commented 4 years ago

Sorry for reopening this discussion, but I have another question regarding ATAC-seq.

I am interested in the cutting sites, so I want to use the --shift and --extsize options. However, as discussed before, we then unfortunately ignore all the mates. I have heard of people just merging the fastqs and treating the paired-end data as single-end to avoid this, however we then have suboptimal alignment.

@taoliu, I am wondering if the assumptions of MACS2 aren't broken if we align PE, and set all RNEXT to *, PNEXT to 0, and shift the mates position by - (minus) TLEN, and use this BAM file for peak calling (with --shift and --extsize).

ATpoint commented 4 years ago

For recent alternatives have a look at https://github.com/LiuLabUB/HMMRATAC from the Liu lab for ATAC-seq or https://github.com/jsh58/Genrich

If you want to use macs and keep both mates you could write the BAM file as BED with one interval for each of the mates, reduced to the 5' end to get the cutting site and then use -f BED to feed that into macs.

Maarten-vd-Sande commented 4 years ago

@ATpoint, thanks! Converting from BAM to BED is probably a much simpler approach!

xpan1 commented 4 years ago

@taoliu > @crazyhottommy @igordot Yes. With -f BAMPE on, MACS2 read the left mate and the insertion length information from BAM file, and discard right mate. With -f BAM, MACS2 only keeps the left mate, so it's definitely not a feature you want for your paired-end data.

How does --keep-dup 1 works when setting -f BAMPE ? will --keep-dup 1 still deduplicate the fragments ?

Hi, did you solve your problems about parameter --keep-dup? When I set it as all or 1, the peak numbers are significantly different? Which one is better for ATAC-Seq?

wangjiawen2013 commented 4 years ago

You can use HMMRATAC instead now, which is a substitution of MACS2 spcific designed to ATACseq. It is developed by Tao Liu too.

xpan1 commented 4 years ago

You can use HMMRATAC instead now, which is a substitution of MACS2 spcific designed to ATACseq. It is developed by Tao Liu too.

Thanks, Sounds good! Btw, what bam files do you use? Can you use the bam files after removing the duplicates using Picard?

EvanTarbell commented 4 years ago

You can use HMMRATAC instead now, which is a substitution of MACS2 spcific designed to ATACseq. It is developed by Tao Liu too.

Thanks, Sounds good! Btw, what bam files do you use? Can you use the bam files after removing the duplicates using Picard?

Yes, HMMRATAC can run with BAM files after duplicate removal. However, by default, HMMRATAC will remove the duplicates for you using the same Picard process. It is also possible to turn off this behavior with the --rmDup option.

yue131 commented 4 years ago

@igordot Hi, I am using MACS2 to call peaks, I am interested in the parameter you used in your article. my data are paired end reads.

yichao-cai commented 3 years ago

Hi, I have quite a naive question regarding the ATAC-seq analysis.

Why do we want to set the --keep-dup all in ATAC-seq. I have seen this setting being used by a lot of people.

My guess: This setting is turned one because the duplicated alignments have been filtered out before the MACS2 step (i.e. by picard). But then is this case wouldn't --keep-dup all and --keep-dup 1 produce identical results? (This is been somewhat touched in this biostar question)

taoliu commented 3 years ago

@yichao-cai --keep-dup all is recommended if the duplicated alignments have been removed in preprocessing steps with other tools. The idea is that we shall not perform the same task with two different tools in our pipeline. For example, if you rely on picard to remove duplicates, then you just leave the task to picard and let macs not to remove duplicates at all. In this way, you will have better control of your pipeline and avoid unexpected errors.

igordot commented 3 years ago

For anyone still following this discussion (more than 4 years later), there is a related thread on Twitter that brings up some interesting points and suggests using "-f BED --shift -100 --extsize 200": https://twitter.com/XiChenUoM/status/1336658454866325506

lucygarner commented 3 years ago

Thanks, that's very helpful

taoliu commented 3 years ago

@igordot I just enabled the 'Discussions' function for MACS on Github: https://github.com/macs3-project/MACS/discussions Hope that the informative discussions in the 'Issues' won't go away because of age :) I can summarize our discussion in this issue and start a new discussion. But if you want to start it, let me know and feel free to do so :) Just don't want to take away your credit.

taoliu commented 3 years ago

Another related issue, as I mentioned in Xi Chen's Twitter thread, is #421. As for MACS, the goal is to enable possible ways to analyze the data, so I will add an option (to keep both ends of BAMPE) to make life easier for people who want to focus on the cutting/insertion sites enrichment in ATAC-seq.

igordot commented 3 years ago

@taoliu there is now a discussion: https://github.com/macs3-project/MACS/discussions/435

I haven't yet used that GitHub feature, but hopefully it should be more appropriate than an "issue".

taoliu commented 3 years ago

Thank you! @igordot

felixschlesinger commented 2 years ago

As for MACS, the goal is to enable possible ways to analyze the data, so I will add an option (to keep both ends of BAMPE) to make life easier for people who want to focus on the cutting/insertion sites enrichment in ATAC-seq.

Any update on this @taoliu ? I was just looking through the code, but didn't find the right place where the filtering of 2nd mates occurs. Also, should this be a change to BAMPE or to plain BAM? If focusing on the cut-sites, the full length of the pairs / fragments does not actually matter, right?