husonlab / megan-ce

MEGAN Community Edition
GNU General Public License v3.0
65 stars 22 forks source link

Masked sequence in standard BLAST output get dropped #15

Open ViralChris opened 2 years ago

ViralChris commented 2 years ago

Hello,

MEGAN CE 6.21.12 fails to import lower case sequences from the standalone default pairwise BLAST (2.9.0+) output ('-outfmt 0'). These lower case sequences indicate low complexity regions that were masked in the BLAST search. As such, MEGAN assumes a gap where there is actually a perfect alignment. This leads to taxa being incorrectly dropped when applying various LCA parameter filters (e.g. % identity). The problem can be circumvented using the XML BLAST output ('-outfmt 5') as MEGAN does import the masked sequences correctly.

Best Chris