Closed ATpoint closed 5 years ago
Thanks for the suggestion.
It is a fair point that not all BAMs really need to be name-sorted for Genrich. The purpose behind that requirement was that all of the alignments for multimapping reads/fragments need to be grouped so they can be analyzed together. (Otherwise, Genrich would need to hold the entire BAM in memory.) Note that this is true for both single-end and paired-end reads.
I will probably add such an option. And yes, the program will be published.
Now with -S
(capital S) Genrich won't require a name-sorted BAM. To be used only by those who know what they are doing.
To analyze unpaired alignments, you will still need to specify one of these options: -y
, -w <int>
, or -x
. -x
won't work with purely single-end reads.
@jsh58 ,
Is it fair to say that "those who know what they are doing" are those who understand to only use -S
if:
-y
or -w<int>
Also, is it possible that -S
will be documented in the README?
No, the -S
option is not limited to any of those specifics.
Prior to version 0.4, Genrich did not even require name-sorted BAMs, because most short read aligners print alignments grouped by read/fragment inherently. Issues arise when people automatically sort BAMs by coordinate, because these are not compatible with Genrich's alignment parsing. The only surefire way to avoid that is to require name sorting.
The -S
option is described in the README.
Hi,
just a minor thing but given one has old-fashioned standard ChIP-seq with single-end rather than paired-end reads, you might consider adding an option to drop the initial check for name-sorted BAM as it does not make sense here, and by default only count single-end reads. One can of course trick Genrich by simply manipulating the BAM header put this is additional effort that is probably not necessary with an extra single-end option. Out of interest, do you plan to publish the tool in the future?