Open qiyunzhu opened 6 months ago
A couple of thoughts on this
Progressive alignment : it is an interesting thought to define an alignment method that itself takes alignments as input. To circumvent the runtime issues that we discussed, I wonder if is possible decouple the scoring scheme from the DP procedure itself. With normal alignment procedures, you can pass in the blosum matrix ahead of time. I wonder if you could pass in a distance matrix so that you wouldn't have to compute the match scores within the DP procedure. See below (calling this new alignment structure BitMSA
, since it seems to be a bit representation of TabularMSA
def align(x : BitMSA, y : BitMSA):
dm = dissimilarity_matrix(x, y) # compute match scores ahead of time
return smith_waterman(x, y, blosum=dm)
I do really like this idea, since it could potentially be used to perform progressive alignment on a very large scale. There are going to be quite a few applications that could be realized with this new general data structure, so I think it is definitely worth fleshing out.
Discussed in https://github.com/scikit-bio/scikit-bio/discussions/1973
@mortonjt @wasade Will appreciate your thoughts!