jydu / maffilter

The MafFilter genome alignment processor
GNU General Public License v3.0
17 stars 5 forks source link

Additional filtering features #1

Open afeurtey opened 6 years ago

afeurtey commented 6 years ago

Here are a few suggestions for alignment block filtering that would be very welcome additions to the MafFilter software:

Thank you for considering these additions!

kloetzl commented 6 years ago

Adding to the wishlist, I would like to see some way to ‘rebase’ the alignment on a different reference: Pick one of the sequences as the new reference coordinate system and all other alignments are piled onto that.

To illustrate, I would like to 'join' the following two blocks, because they reference the same section on CP000034.

a score=171.0
s CP000034.chr0 4032916 172 + 4369232 gtgaaccgccccgggtttcctggagagtgttttatctgtgaactcaggctgccagatcatcgtttccgatggaagcataataagctttttctgcttctgccggtgggatatggccctgcctttccagcaatcgtcgattgttataccagtccacccacgtgagtgtggtcag
s BA000007.chr0 1431896 172 + 5498450 gtgaaccgccccggttttcctggagagtgttttatctgtgaactcaggctgccagatcatcgtttctgatggaagcataataagctttttctgcttctgccggagggatatgacccagccttcccagcaatcgtcgattgttataccagtccacccacgtgagtgtggccag

a score=171.0
s CP000034.chr0 4032916 172 + 4369232 gtgaaccgccccgggtttcctggagagtgttttatctgtgaactcaggctgccagatcatcgtttccgatggaagcataataagctttttctgcttctgccggtgggatatggccctgcctttccagcaatcgtcgattgttataccagtccacccacgtgagtgtggtcag
s BA000007.chr0  147871 172 - 5498450 gtgaaccgccccgggtttcctggagagtgttttatctgtgaactcaggctgccagatcatcgtttccgatggaagcataataagctttttctgcttctgccggaggagtatggcccagccttcccagcaatcgtcgattgttataccagtccacccacgttagtgtggccag
jydu commented 6 years ago

Ouch, this one is not straightforward... it basically means that (part of) one genome sequence has been aligned twice, which is not nice. Where does such an alignment comes from? Normally genome aligners tend to avoid such situations...

J.

kloetzl commented 6 years ago

The above sequence is a transposon that appears once in CP000034, but multiple times in BA000007. MUMmer correctly finds all the homologues (converted using delta2maf).

jydu commented 6 years ago

Ok. But it's a bit like converting the output of blast to a Maf file, a bit away from the original idea of a genome alignment. I am not aware of any tool that could do such merging of blocks (not straightforward, because in the general case it would involve realigning, as the transposons might not be aligned the same way each time...). It is a bit far away from the type of analyses MafFilter does, that is, filtering the synteny blocks. The TBA package contains several auxiliary programs that perform tasks on maf files, maybe there is sthg there?

J.

kloetzl commented 6 years ago

Hm, guess I will have to come up with my own tool. @afeurtey already suggested using TBA's maf_project but that also doesn't ‘pile’ same positions …