rvaser / spoa

SIMD partial order alignment tool/library
MIT License
158 stars 32 forks source link

How to prevent gaps at the ends of the alignment? #61

Closed Robin-Rounthwaite closed 2 years ago

Robin-Rounthwaite commented 2 years ago

Hello! I'm trying to align an alignment with sPOA that never has gaps at the ends of the alignment. I.E.:

GATTACA GATTA - - is forbidden, but GATTACA GATT - - A is fine.

This is possible in my use-case, because the strings are guaranteed to end and begin with the same characters. (Explanation of my use case appended to the end of this issue.)

There are a couple ways I could imagine doing this, but I couldn't find a way to implement them. Here are the two ways: 1) I could replace the start and end of the string with special characters that have an extremely high match score, e.g. the input strings GATTACA GATTA become XATTACX XATTX . But I couldn't find a way to make character-specific match scores.

2) I could directly penalize gap open/extends that lead to the end of the alignment string. I didn't see a way to do that either.

Is something like this possible to do in sPOA? If not, would you be willing to add the feature?


Explanation of my use case: I'm working on a tool for VG that simplifies poorly-constructed snarls that contain duplicated sequence information through multiple paths in the snarl. My tool extracts the haplotypes from the snarl, realigns them, and converts the alignment into a replacement snarl for the graph.

For this to work, I need to guarantee that each haplotype still stretches from the source to sink inside the snarl. I.e., the first character of each haplotype and the last character of each haplotype must be guaranteed to be aligned together.

rvaser commented 2 years ago

Hi Robin, unfortunately, it is not possible with current API. Global alignment is your best bet. Although, wouldn't aligning the middle parts of the sequences (without first/last character) do the trick? Example:

GATTACA    G   ATT-A-C   A
GATTA   -> G   ATT       A
GATTTAAA   G   ATTTAA    A
               ^^^^^^^

Best regards, Robert

Robin-Rounthwaite commented 2 years ago

It would indeed. I had only realized this yesterday, and was planning on writing y'all an update today to this effect! Thank you for your kind insight. Wishing you all well,