rrwick / Polypolish

a short-read polishing tool for long-read assemblies
GNU General Public License v3.0
146 stars 10 forks source link

More alignments after filtering with polypolish_insert_filter.py #10

Closed williamsmicrobegenome closed 2 years ago

williamsmicrobegenome commented 2 years ago

I'm using Polypolish in both my Genomics class and my research. Following the wiki, I've used polypolish_insert_filter.py to process sam alignments (generated by bwa-mem) before analyzing with Polypolish. For a couple of different bacterial genomes, I've noticed the number of alignments after filtering is higher than the number of alignments before filtering. Based on the wiki, I don't think this is expected behavior, but I might be missing something about how this script runs.

Here's an example (with generic filenames): bwa mem -a reference.fasta R1.fastq > alignmentR1.sam bwa mem -a reference.fasta R2.fastq > alignmentR2.sam polypolish_insert_filter.py --in1 alignmentR1.sam --in2 alignmentR2.sam --out1 filteredR1.sam --out2 filteredR2.sam Alignments before filtering: 4,102,806 Alignments after filtering: 4,149,748

rrwick commented 2 years ago

Thanks for pointing this out! I think I've figured it out, and I just pushed a fix (de62102). Briefly, the problem was that the script was not counting unaligned SAM lines in the 'before' count was was counting them in the 'after' count. The fix makes the script no longer count them in the 'after' count.

I don't think this will change anything downstream, but hopefully the numbers will now make more sense!

Ryan