jsh58 / Genrich

Detecting sites of genomic enrichment
MIT License
185 stars 27 forks source link

Feature Request: Drop name-sort check #26

Closed ATpoint closed 5 years ago

ATpoint commented 5 years ago

Hi,

just a minor thing but given one has old-fashioned standard ChIP-seq with single-end rather than paired-end reads, you might consider adding an option to drop the initial check for name-sorted BAM as it does not make sense here, and by default only count single-end reads. One can of course trick Genrich by simply manipulating the BAM header put this is additional effort that is probably not necessary with an extra single-end option. Out of interest, do you plan to publish the tool in the future?

jsh58 commented 5 years ago

Thanks for the suggestion.

It is a fair point that not all BAMs really need to be name-sorted for Genrich. The purpose behind that requirement was that all of the alignments for multimapping reads/fragments need to be grouped so they can be analyzed together. (Otherwise, Genrich would need to hold the entire BAM in memory.) Note that this is true for both single-end and paired-end reads.

I will probably add such an option. And yes, the program will be published.

jsh58 commented 5 years ago

Now with -S (capital S) Genrich won't require a name-sorted BAM. To be used only by those who know what they are doing.

To analyze unpaired alignments, you will still need to specify one of these options: -y, -w <int>, or -x. -x won't work with purely single-end reads.

malcook commented 3 years ago

@jsh58 ,

Is it fair to say that "those who know what they are doing" are those who understand to only use -S if:

Also, is it possible that -S will be documented in the README?

jsh58 commented 3 years ago

No, the -S option is not limited to any of those specifics.

Prior to version 0.4, Genrich did not even require name-sorted BAMs, because most short read aligners print alignments grouped by read/fragment inherently. Issues arise when people automatically sort BAMs by coordinate, because these are not compatible with Genrich's alignment parsing. The only surefire way to avoid that is to require name sorting.

The -S option is described in the README.