rust-bio / rust-bio-tools

A set of command line utilities based on Rust-Bio.
MIT License
182 stars 24 forks source link

fix: consensus reads: correct handling of read orientation #252

Closed FelixMoelder closed 1 year ago

FelixMoelder commented 1 year ago

This PR introduces some changes to the handling of read orientations during consensus read calculation. In its previous implementation records originating from the reverse strand where not written as reverse complement to the fastq-files. This has been fixed.

In addition, duplicate reads are grouped not only by identical CIGAR strings but also by read orientation. This means grouping is performed across read pairs ensuring that all duplicate r1 records originate from the same strand as well as all r2 records. Therefore, it is only necessary to store a strand information string for overlapping consensus reads where the mates are from different stands. In all other cases, no strand information is saved.

Edit: An additional EF-tag will be added to overlapping consensus reads denoting that the read covers the entire fragment.

FelixMoelder commented 1 year ago

Paired records will now be written to the correct fq-writer. This means that if the first mate is tagged as second in pair, it will be written to the fq2-writer, while the second mate will be written to the fq1-writer.