simon-anders / htseq

HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.
https://htseq.readthedocs.io/en/release_0.11.1/
GNU General Public License v3.0
122 stars 77 forks source link

definition of 'too_low_aQual' and 'not_aligned' criteria #76

Closed sandbardev closed 5 years ago

sandbardev commented 5 years ago

I have a sample being mapped with hisat2 and tophat2. They yield considerably similar results, except in these two specific parameters:

tophat2:

no_feature 491048 ambiguous 459500 too_low_aQual 0 __not_aligned 0 alignment_not_unique 2423556

hisat2:

no_feature 495306 ambiguous 497703 too_low_aQual 244930 __not_aligned 168542 alignment_not_unique 2729919

What is used to evaluate these two parameters, and why is tophat2's result equal 0?

simon-anders commented 5 years ago

"too low aQual" means that the alignment quality (5th column in a SAM file, a.k.a. "MAPQ") is below the user-chosen threshold

"not aligned" means that the read was reported but no alignment was found. I suppose that TopHat2 omits these reads from the SAM file.

sandbardev commented 5 years ago

"too low aQual" means that the alignment quality (5th column in a SAM file, a.k.a. "MAPQ") is below the user-chosen threshold

This was exactly what I was looking for. Thank you. As a follow up, please consider adding information on the source of that value to the htseq documentation. Right now it only mentions the default threshold being 10 as of version 0.5.4.

simon-anders commented 5 years ago

If the documentation really doesn't say so, we should add it.

Hence reopening the issue to serve as reminder to do so.

sandbardev commented 5 years ago

Admittedly, it does mention what it is, just not where it comes from:

-a , --a= skip all reads with alignment quality lower than the given minimum value (default: 10 — Note: the default used to be 0 until version 0.5.4.)

iosonofabio commented 5 years ago

Let's just improve that sentence a bit ;-)

On Sun, Mar 31, 2019, at 12:19, Sanderson de Paula wrote:

Admittedly, it does mention what it is, just not where it comes from:

-a , --a= skip all reads with alignment quality lower than the given minimum value (default: 10 — Note: the default used to be 0 until version 0.5.4.)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/simon-anders/htseq/issues/76#issuecomment-478332723, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJSAJhvsXMApuxXN_VyQhji0Sp06Mftks5vcJm-gaJpZM4cLeIZ.

iosonofabio commented 5 years ago

Addressed this in 6f7d66e, closing