ssadedin / bazam

A read extraction and realignment tool for next generation sequencing data
GNU Lesser General Public License v2.1
98 stars 16 forks source link

Missing Reads #15

Open christopher-schroeder opened 5 years ago

christopher-schroeder commented 5 years ago

Dear bazam team:

I have noticed that bazam does not output all read pairs. That was quite unexpected for me. Does it have a built-in filter? For example for duplicates? And can you configure it to really output all read pairs?

Best Christo

christopher-schroeder commented 5 years ago

Oh sorry, never mind. The input bam seems to have missing mates for some reads. But just to make sure: Bazam outputs all reads and doesn't filter anything (except single reads).

ssadedin commented 5 years ago

Hi, yes you are correct - Bazam will definitely output every read that has a mate and will not filter anything. However currently it does drop single ended reads. Potentially a flag could be added to output them as single ends when streaming into aligners (eg: BWA) that will do "smart" pairing of the reads (as in: if two consecutive reads have the same name they are treated as paired, if they differ they are treated as single ends).