I have a question on Genrich ATAC-seq read shifting wrt peak calling.
I know that Genrich does the positional increase/decrease as appropriate. Some questions:
Most programs use +4/-5 for shifting, but Genrich uses +5/-5? Any specific reasons. I know this might be insignificant in the grand scheme of things pertaining to peak calling, but I was just curious.
Is it advisable to let Genrich do its own read-shifting? In other words, can I do read-shifting separately and use -D option in Genrich - my main worry with this approach is that the bam headers/alignments might get inadvertently affected ... which is what the core of this Question/Issue is all about below
By default, Genrich centers the intervals at the ends of the
reads/fragments, adjusted forward by 5bp to account for
the Tn5 transposase occupancy. That is, for the 5' ends of
fragments (or for reads aligning in a normal orientation),
the position is increased by +5, and for the 3' ends of
fragments (or for reads aligning in a reverse-complement
orientation), the position is adjusted by -5
I was a bit surprised that the authors did not consider Genrich in their paper (probably because Genrich is not published)
Yan, F., Powell, D.R., Curtis, D.J. et al. From reads to insight: a hitchhiker’s guide to ATAC-seq data analysis.
Genome Biol 21, 22 (2020).
https://doi.org/10.1186/s13059-020-1929-3
Their analyses pertains to MACS2 and this below caught my attention pertaining to shifting of reads before ATAC-seq peak calling
if($line[1] ~~ $flags{'negative'}){ # if the read is mapped to the reverse strand
$line[7]=$line[7]+4; # move the mate start 4 bp
$line[8]=$line[8]+9; # adjust the inferred insert size
$line[9] =substr($line[9],0,-5); # remove 5 bp from the reads
$line[10] =substr($line[10],0,-5);# remove the 5bp from quality as well
# this is ugly, but this is a way to change the CIGAR to reflect that I'm shortening the read
# for the negative stran I want to subtract the 5p from the last entry (which should be M, for match)
$line[5] = &CIGARtrim($line[5],'-',5); # from the CIGAR string, trim 5, and we're on the negative strand
}elsif($line[1] ~~ $flags{'positive'}){ # if the read is mapped to the positive strand
$line[3]=$line[3]+4; # move the start 4 bp
# these were removed, because they were incorrect, you don't actually move this end of the negative read (which this would be, because it's the mate of a positive strand read).
# for a negative read you just trim from the other end, the start position stays the same.
#$line[7]=$line[7]-5; # move the mate start 5 bp
#$line[7]=1 if $line[7]<=0; # just to make sure we didn't go off the end
$line[8]=$line[8]-9; # adjust the inferred insert size
$line[9] =substr($line[9],4);# remove 4bp from the reads
$line[10] =substr($line[10],4);# remove the 4bp from quality as well
# this is ugly, but this is a way to change the CIGAR to reflect that I'm shortening the read
# for the positive strand I want to subtract the 4 from the first entry (which should be M, for match)
$line[5] = &CIGARtrim($line[5],'+',4); # from the CIGAR string, trim 4, and we're on the positive strand
}
Does Genrich also carry out these as done by Yan F et al.:
adjust the inferred insert size +9/-9
remove 5/4 bp from the reads based on reverse/positive strand mapping
remove 5/4 bp from quality score based on reverse/positive strand mapping
Thanks for the question. Genrich interprets BAM files, and in ATAC-seq mode it uses the method described here. But Genrich does not produce edited BAM files, so there is no need for it to go through those ridiculous manipulations of SAM fields.
Hi @jsh58
I have a question on Genrich ATAC-seq read shifting wrt peak calling.
I know that Genrich does the positional increase/decrease as appropriate. Some questions:
-D
option in Genrich - my main worry with this approach is that the bam headers/alignments might get inadvertently affected ... which is what the core of this Question/Issue is all about belowThe first author of this paper below has a corresponding GH repo - https://github.com/alexyfyf/atac_nf
Their analyses pertains to MACS2 and this below caught my attention pertaining to shifting of reads before ATAC-seq peak calling
https://github.com/alexyfyf/atac_nf/blob/master/bin/ATAC_BAM_shifter_gappedAlign.pl
Does Genrich also carry out these as done by Yan F et al.:
Thanks in advance.