Open husaynahmed opened 4 years ago
Suggest also reading Thomas Caroll ATACseq workflow: https://rockefelleruniversity.github.io/RU_ATAC_Workshop.html Goes into using Subread to partitioning alignment BAM file into nucleosome free region, di-nucleosome, and mono-nucleosome alignment files. Based on the ATACseq Greenleaf paper..
Suggest also reading Thomas Caroll ATACseq workflow: https://rockefelleruniversity.github.io/RU_ATAC_Workshop.html Goes into using Subread to partitioning alignment BAM file into nucleosome free region, di-nucleosome, and mono-nucleosome alignment files. Based on the ATACseq Greenleaf paper..
Thanks, @houghtos. This pipeline also performs partitioning. Perhaps I will give it a try. However, it would be great if we could integrate this feature in the nf-core/atacseq pipeline.
Thanks for the detailed info and references @husaynahmed @houghtos. Having thought about this a little I think this is something that would be ideal to implement as a sub-workflow using DSL2. This means that we can just pass a BAM file (filtered in whatever way) to the same sub-workflow to run the downstream processes. I think I will probably start making a serious effort to port this pipeline to DSL2 after the next release so your input and contributions will be more than welcome 😃
Hi @husaynahmed
Thank you for these references. I have an additional question related to reads shifting and peak calling parameters ... which is finally also related to #108.
If I'm correct, when you are using Macs2 --shift 100 --extend 200
or genrich -j -y
(see https://informatics.fas.harvard.edu/atac-seq-guidelines.html), you do not need to do reads shifting, as the shift is part of the peak calling analysis ?
However, applying the shift with alignmentSieve
can indeed be useful if you are using for instance Macs2 in BAMPE mode using the full fragment as a proxy.
Is that correct ? Thanks Nicolas
Hi,
I was wondering if adding the feature of shifting and splitting the reads after the alignment would be helpful.
A recent review on the approaches to analyze ATAC-seq data (Yan et al., 2020) suggests reads should be shifted + 4 bp and − 5 bp for positive and negative strand respectively, to account for the 9-bp duplication created by DNA repair of the nick by Tn5 transposase and achieve base-pair resolution of TF footprint and motif related analyses.
Also, in many studies researchers have looked into the nucleosomal-free regions and nucleosome associated regions separately. For example Yoshida et al., 2019 .
Here's what I am proposing.
Let's have three modes of post-alignment analysis for the ATAC-seq pipeline.
The first approach (which is the current atacseq pipeline) is useful in almost all cases where identifying the open chromatin regions is the objective. The latter would be helpful if the exact cut sites of transposase is important like motif analyses, footprinting etc., or when the analysis requires looking into the NFR/NBR regions separately.
The downstream analysis and QC could be done depending on which of these options the user chooses. For example, if the user chooses option 2, we could generate mergedLibrary and mergedReplicate BAM, bigWigs etc separately for NFR and NBR and perform peak calling and all the QC for them separately.
I have used deeptools in the past to perform shifting and splitting. Here's an example code for the same.
It would be great to know comments and suggestions from the nf-core members. I would be happy to discuss. I am a beginner in the nf-core community but would be happy to contribute to this pipeline enhancement.