marbl / MashMap

A fast approximate aligner for long DNA sequences
Other
268 stars 39 forks source link

about mashmap.out result #36

Closed zhoudreames closed 2 years ago

zhoudreames commented 3 years ago

image the results looks like strange,Why can sequences with length 1914830 be fully aligned to sequences with length 2645 ?

cjain7 commented 3 years ago

Hi, can you attach the two sequences here?

cjain7 commented 3 years ago

Pls also mention the command line parameters I should use to reproduce this

zhoudreames commented 3 years ago

fa.zip mashmap -r 2.fa -q 1.fa

cjain7 commented 2 years ago

Thanks for sharing the sequences. It looks like the longer sequence (-q 1.fa, query) is a tandem repeat. Mashmap splits the longer sequence into non-overlapping sequences; each of them align to the reference (short sequence 2.fa). At this point, it should have ideally reported individual alignments but it incorrectly merges all alignments because they are located within a threshold range on the reference sequence. The merging algorithm in Mashmap can be improved, I think a co-linear chaining algorithm for merging would be better instead.

Anyways, I won't recommend using Mashmap for tandem repeats like these; k-mer jaccard similarity is not reliable here.

zhoudreames commented 2 years ago

thanks for your reply~