Closed levlitichev closed 4 years ago
Thanks for the question. It can definitely be confusing. The term aln sets
refers to all of the alignments of a read/fragment.
The discrepancy you noted is alluded to in the description of the -r
option:
[...] all alignments to skipped chromosomes (-e) or genomic regions (-E) are still evaluated.
Therefore, the To skipped refs
alignments are analyzed along with the Paired alignments
and Unpaired alignments
when determining PCR duplicates. The reason why this is done is for reads/fragments, such as those in this BAM file, that have secondary alignments.
Okay, that helps. Thanks very much for your quick reply.
Hello,
Despite reading the README and searching online, I'm still having difficulty making sense of the verbose output of Genrich. Take the example included in the README:
It makes sense that
Unmapped
+To skipped refs
+Paired alignments
+Unpaired alignments
=BAM records analyzed
. However, I expectedPaired aln sets
to be the same asPaired alignments
, but there is a substantial drop-off. Same withUnpaired alignments
andSingleton aln sets
.My best guess is that "set" here is meant in the mathematical sense, so
Paired aln sets
is also the number of unique paired alignments. But if that's the case, then I don't understand whyFull fragments
is less thanPaired aln sets
, unless alignments are thrown out for some reason besides PCR duplication. Is there some additional filtering happening under the hood?If you could help me understand the relationships between some of these numerical outputs, I would greatly appreciate it. Thanks very much in advance!
Best regards, Lev