rcedgar / muscle

Multiple sequence and structure alignment with top benchmark scores scalable to thousands of sequences. Generates replicate alignments, enabling assessment of downstream analyses such as trees and predicted structures.
https://drive5.com/muscle
GNU General Public License v3.0
186 stars 21 forks source link

Where did profile-profile alignment go in v5? #55

Closed rtviii closed 5 days ago

rtviii commented 1 year ago

There is still ample documentation on profile-profile alignment from v3.8, but searching the source code for v5 im not finding any references to it.. Has that been obsoleted or replaced by a new command?

Thanks a lot for your work.

rcedgar commented 1 year ago

Neither, really it's an oversight that profile-profile is not a command because it is a subroutine, Note that muscle v5 is not meant to be a replacement for muscle v3, it's a complete re-design which is complementary in some ways, e.g. there are datasets that muscle v3 can easily handle which are too big for muscle v5 (long sequences), and vice versa (many sequences). I'll leave this issue open as a feature request, hopefully I'll get to this before long, but probably not :-(

rtviii commented 1 year ago

Thanks for the swift response and yes, that was my impression more or less. Again, kudos on both designs (v3 and 5).

Given this state of things i don't see why i wouldn't just continue using 3.8. It works really well for me, but am wondering anyway if the profile-profile can be replicated more or less with the "ensembles" in v5 or it's a different concept?

rcedgar commented 1 year ago

Different concept. Ensembles are alternative MSAs which are equally good, this allows you to test whether the inferences you are making from the alignments are robust.

For typical easy alignment problems, it doesn't matter which software you use -- clustal, mafft, muscle will all give very similar results. If the alignment is hard, then they will give different results, and the question is whether the differences matter in practice. This question is answered by making a muscle v5 ensemble, because the variants all have state-of-the-art accuracy.