jsh58 / Genrich

Detecting sites of genomic enrichment
MIT License
182 stars 27 forks source link

[Question]: Genrich ATAC-seq read-shifting peak calling #105

Closed tamuanand closed 4 months ago

tamuanand commented 11 months ago

Hi @jsh58

I have a question on Genrich ATAC-seq read shifting wrt peak calling.

I know that Genrich does the positional increase/decrease as appropriate. Some questions:

  1. Most programs use +4/-5 for shifting, but Genrich uses +5/-5? Any specific reasons. I know this might be insignificant in the grand scheme of things pertaining to peak calling, but I was just curious.
  2. How critical is read shifting for ATAC-seq peak calling? HMMRATAC author had this - https://github.com/LiuLabUB/HMMRATAC/issues/21#issuecomment-525303685
  3. Is it advisable to let Genrich do its own read-shifting? In other words, can I do read-shifting separately and use -D option in Genrich - my main worry with this approach is that the bam headers/alignments might get inadvertently affected ... which is what the core of this Question/Issue is all about below
By default, Genrich centers the intervals at the ends of the 
reads/fragments, adjusted forward by 5bp to account for 
the Tn5 transposase occupancy. That is, for the 5' ends of 
fragments (or for reads aligning in a normal orientation), 
the position is increased by +5, and for the 3' ends of 
fragments (or for reads aligning in a reverse-complement 
orientation), the position is adjusted by -5

The first author of this paper below has a corresponding GH repo - https://github.com/alexyfyf/atac_nf

Does Genrich also carry out these as done by Yan F et al.:

  1. adjust the inferred insert size +9/-9
  2. remove 5/4 bp from the reads based on reverse/positive strand mapping
  3. remove 5/4 bp from quality score based on reverse/positive strand mapping
  4. trim the CIGAR string

Thanks in advance.

jsh58 commented 10 months ago

Thanks for the question. Genrich interprets BAM files, and in ATAC-seq mode it uses the method described here. But Genrich does not produce edited BAM files, so there is no need for it to go through those ridiculous manipulations of SAM fields.