Closed dariober closed 3 years ago
Thanks for the question. I do not think it makes sense to alter the Genrich code for this. If you want to modify the bitwise FLAGs in your BAM, that can be accomplished pretty easily with samtools
and awk
(or bioawk
).
Ok, thanks for replying. It's your call of course. If anyone lands here with the same question, here's what I've done. Starting with coordinate sorted bam file, sort by read name and remove the duplicate read flag:
samtools sort -n -@ 8 {input.bam} \
| samtools view -h \
| awk -v FS='\t' -v OFS='\t' '{if(and($2, 1024) == 1024 && $1 !~ "^@") {
$2 = $2 - 1024
}
print $0}' \
| samtools view -@ 4 -b > {output.bam}
I don't think it is a good design choice to remove the duplicates without the user setting the -r option, its very confusing behaviour if you are not expecting it and it is something easy to miss. I do think you should change it so Genrich only removed pcr duplicates if -r is set, but at the very least this should be clear under the description of -r. Somemthing like "Note that if reads have been previously marked as duplicates Genrich will remove them even if -r is not set"
Hi- As mentioned in this issue
Would it be possible to let the user decide whether reads marked as duplicate should be discarded? I have libraries sequenced at high depth with duplicates marked which I would like to keep. Thanks!